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ABSTRACT 


The  thesis  is  divided  into  two  pans.  In  the  firs  pan.  we  describe  and  analyte 
several  new  VLSI  layouts  for  the  shuffle  n change  graph.  These  include 

1)  an  asymptotical!)  optimal  OfV-’/tog^AO-ana  layout  lor  the  Af-node  shuffle- 
exchange  graph,  and 

2)  several  practical  layouts  far  auafl  shu file-exchange  graphs. 

The  aew  layouts  require  suhsuntiaRy  fast  area  than  previously  known  layouts 
and  can  serve  as  the  basis  for  designing  large  sale  shuffle-exchange  chips. 

In  the  second  part  of  the  thesis,  we  develop  general  methods  far  proving  lower 
bounds  on  the  layout  ares,  creating  number,  bisection  width  and  maximum  edge 
length  of  VLSI  networks.  Among  other  things,  we  me  these  methods  to  find 

1)  an  N- node  planar  graph  which  bis  layout  ares  OiNlogM)  and  maximum 
edge  length  €KN,y}/loJ/*N). 

2)  an  Af-node  graph  with  ao  0(N//7>separaaor  which  has  layout  area 
Q(Nlot*N)  and  maximum  edge  length  8( Nt/*bgN/losloiN),  and 

3)  an  A'-node  graph  with  an  0(A^>separatar  (far  m>//7)  which  has  maximum 
edge  length  <**»). 

The  area  results  indicate  that  some  graphs  with  0(Af,/7)-scparaton  (and.  in 
particular,  some  planar  graphs)  do  nor  hove  linear-area  layouts,  thus  disproving  a 
popular  conjecture.  The  edge  length  bounds  huhraac  that  the  layouts  of  some 
networks  must  have  very  long  wires  (pomfafy  as  tong  at  the  width  of  the  layout). 
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INTRODUCTION 

The  recent  engineering  advances  in  Very  Large  Scale  Integrated  (VLSI)  circuitry 
have  made  it  possible  to  wire  tens  of  thousands  of  transistors  onto  a  single  chip.  In 
the  near  future,  it  is  expected  that  fabrication  of  chips  containing  millions  of 
transistors  will  be  commonplace  [MC80].  In  order  that  this  massive  computational 
resource  be  efficiently  utilized,  theoretical  researchers  have  been  actively  trying  to 
answer  such  questions  such  as: 

1)  "What  is  a  good  model  for  VLSI  chip  design  and  computation?," 

2)  "What  communications  networks  can  best  perform  important  operations 

such  as  sorting,  matrix  multiplication  and  discrete  Fourier  transform?"  and 

3)  "What  is  the  best  method  of  laying  out  a  network  on  a  chip?." 

Several  models  have  been  proposed  for  VLSI  computation  [T80,LS81,CM81]. 
The  most  widely  accepted  is  due  to  Thompson  and  is  known  as  the  Thompson 
model  [T79,T80].  Thompson’s  model  of  a  VLSI  chip  is  quite  simple.  The  chip  is 
presumed  to  consist  of  a  grid  of  vertical  and  horizontal  tracks  which  are  spaced 
apart  by  unit  intervals.  Processors  are  viewed  as  points  and  are  located  only  at  the 
intersection  of  grid  tracks.  Wires  are  routed  through  the  tracks  in  order  to  connect 
pairs  of  processors.  Although  a  wire  in  a  horizontal  track  is  allowed  to  cross  a  wire 
in  a  vertical  track,  pairs  of  wires  are  not  allowed  to  overlap  for  any  distance  (i.e.,  in 
they  cannot  overlap  in  the  same  track).  Further,  wires  are  not  allowed  to  overlap 
processors  to  which  they  are  not  linked.  As  an  example,  we  have  drawn  a 
Thompson  model  layout  of  a  -/-processor  network  in  Figure  1. 


- 0 - 

■■GS""  M 

Figure  1 :  A  Thompson  model  layout  of  a  ^-processor  network  in 
which  each  processor  is  linked  to  every  other  processor. 
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Much  has  also  been  accomplished  in  the  way  of  finding  good  communications 
networks  for  VLSI.  For  example,  the  complete  binary  tree  [MC80],  the  2- 
dimensional  mesh  [TK77,KL78,MC80],  the  cube-connected-cycles  graph  (PV79) 
and  the  shuffle-exchange  graph  [S71,L75,L76,NS79,P80,S80,SR80a,T79,T80]  are  all 
known  to  be  capable  of  performing  a  wide  .ange  of  operations.  The  shuffle- 
exchange  graph,  in  particular,  is  an  incredibly  powerful  and  efficient 
communications  network.  Among  other  things,  it  can  be  used  to  compute  discrete 
Fourier  transforms,  multiply  matrices,  sort  lists  and  evaluate  polynomials.  Except 
for  sorting  (which  requires  0 (log^N)  time),  these  operations  require  no  more  than 
logarithmic  time  and  constant  space  per  processor.  This  is  exponentially  faster  than 
the  running  times  of  the  corresponding  sequential  algorithms  and  the 
corresponding  parallel  algorithms  on  networks  such  as  the  2-dimensional  mesh. 
As,  in  addition,  the  processors  required  for  these  operations  are  quite  simple,  the 
shuffle-exchange  network  is  very  well  suited  for  VLSI  implementation  on  a  chip. 

The  shuffle-exchange  graph  comes  in  various  sizes.  In  particular,  there  is  an 
A'-node  shuffle-exchange  graph  for  ever}’  N  which  is  a  power  of  two.  Each  node  of 
the  (jV=2*)- node  shuffle-exchange  graph  is  associated  with  a  unique  A-bit  binary 
string  ak.j  •  •  •  a0 .  Two  nodes  w  and  w'  are  linked  via  a  shuffle  edge  if  tv'  is  a  left 
or  right  cyclic  shift  of  tv  (i.e.,  if  tv  =  ak.r  •  >a0  and  w%  =  ak.2 •  •  -a(Pk-i  or 
tv' =  a0'"ak.icti  ,  respectively).  Two  nodes  w  and  tv’  are  linked  via  an 
exchange  edge  if  w  and  w’  differ  only  in  the  last  bit  (i.e.,  if  tv  =  ak.j>  •  *afi  and 
w'  =  ak.r  •  -a/1  or  vice-versa).  As  an  example,  we  have  drawn  the  5-node 
shuffle-exchange  graph  in  Figure  2.  Note  that  the  shuffle  edges  are  drawn  with 
solid  lines  while  the  exchange  edges  are  drawn  with  dashed  lines.  We  shall  follow 
this  convention  throughout  the  thesis. 


Figure  2:  The  8- tunic  shuffle-exchange  graph 
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The  third  question  of  interest  to  VLSI  researchers  ("What  is  the  best  method  of 
laying  out  a  network  on  a  chip?")  has  proved  to  be,  by  far,  the  most  difficult.  It  is 
also  the  subject  of  this  thesis.  In  order  to  answer  the  question  for  a  particular 
network,  we  must  do  the  following  three  things: 

1)  decide  what  it  means  for  a  layout  to  be  "good," 

2)  find  a  "good"  layout  for  the  network,  and 

3)  prove  that  the  layout  is  as  "good"  as  possible. 

Most  people  agree  that  a  "good"  layout  is  one  which  does  not  require  much 
area.  This  is  quite  reasonable  since  small  layouts  are  easier  to  wire  on  a  chip,  cost 
less  and  have  far  higher  yields  than  layouts  with  larger  amounts  of  area.  Recently, 
there  has  also  been  interest  in  designing  layouts  with  short  wires.  Although  wire 
length  considerations  are  not  as  important  as  area  considerations,  it  is  possible  that 
layouts  with  long  wires  may  be  less  efficient  and  run  slower  (due  to  longer 
transmission  times)  than  layouts  with  shorter  wires.  Both  quantities  are  easily 
expressed  in  terms  of  the  Thompson  model,  which  is  nice  from  a  mathematical 
point  of  view.  For  example,  the  layout  area  of  a  network  is  the  minimum  amount 
of  area  required  to  lay  out  the  network  in  the  Thompson  model.  (The  area  of  a 
layout  in  the  Thompson  model  is  defined  to  be  the  product  of  the  number  of 
vertical  tracks  and  the  number  of  horizontal  tracks  which  contain  a  processor  or 
wire  segment  of  the  layout)  Similarly,  the  maximum  edge  length  of  a  network  is 
the  minimum  amount  of  wire  which  is  needed  to  embed  the  longest  edge  in  any 
Thompson  model  layout  of  the  network. 

Good  layouts  are  known  for  several  communications  networks;  including  the 
complete  binary  tree  [MR79,PRS81,BL81],  the  2-dimensional  mesh  and  the  cube- 
connected-cycles  graph  [PV79J.  The  known  layouts  for  the  shuffle-exchange  graph, 
however,  are  not  very  good.  Thompson  [T80]  was  the  first  to  find  a  nontrivial 
layout  for  the  shuffle-exchange  graph.  In  particular,  he  found  an  0(N2/logl/2N)- 
area  layout  of  the  N-node  shuffle-exchange  graph.  He  also  showed  that  any  layout 
for  the  7V-node  shuffle-exchange  graph  must  have  at  least  Sl(N2/log2N)  area.  Hoey 
and  Lciserson  [HL80]  improved  the  upper  bound  by  finding  an  O(/Vv7ogA0-area 
layout  for  the  A-node  shuffle-exchange  graph.  Neither  Thompson's  nor  Hoey  and 
Leiserson’s  layouts  are  practical,  however,  and  neither  meets  Thompson's 
asymptotic  lower  bound. 
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In  Part  I  of  ihc  thesis,  we  find  good  layouts  for  the  shuffle-exchange  graph.  In 
particular,  we  describe  an  asymptotically  optimal  0(/V-V/og-A/)-area  layout  for  the 
A/- node  shu file-exchange  graph.  Although  the  layout  is  not  optimal  for  small 
values  of  A'  we  show  how  it  can  be  modified  in  order  to  produce  good  layouts  for 
small  shuttle-exchange  graphs.  As  these  layouts  are  practical,  it  should  now  be 
possible  to  build  a  shuffle-exchange  chip. 

Finally,  we  are  left  with  the  task  of  proving  that  a  layout  which  appears  to  be 
good  is,  in  fact,  optimal.  Although  Thompson  [T79,T80],  Vuillemin  [V80]  and 
Lipton  and  Sedgewick  [LS81]  have  all  shown  how  to  prove  area  lower  bounds  for 
certain  computationally  useful  networks  (such  as  the  shuffle-exchange  graph),  it  is 
not  known  how  to  prove  such  lower  bounds  in  general.  For  example,  no  nontrivial 
lower  bounds  have  been  found  for  the  class  of  graphs  which  have  0 (A//2)- 
separators.  (This  class  includes  the  very  important  class  of  planar  graphs.)  Nor 
have  any  methods  been  discovered  for  proving  nontrivial  lower  bounds  on  the 
maximum  edge  length  of  a  network. 

In  Part  II  of  the  thesis,  we  describe  several  techniques  for  proving  good  layout 
area  and  maximum  edge  length  lower  bounds.  In  particular,  we  concentrate  on 
finding  good  lower  bounds  for  the  crossing  number,  wire  area  and  maximum  edge 
crossing  of  a  network.  The  crossing  number  of  a  graph  is  the  minimum  number  of 
pairs  of  edges  which  must  cross  in  any  draw  ing  of  the  graph  in  the  plane.  The 
maximum  edge  crossing  of  a  graph  is  the  largest  number  of  edges  which  must  be 
crossed  by  some  edge  in  any  drawing  of  the  graph.  The  wire  area  of  a  network  is 
simply  the  minimum  amount  of  wire  which  must  be  used  to  embed  the  network  in 
the  Thompson  model.  It  is  clear  that  for  any  network, 

crossing  number  <  wire  area  <  layout  area 

and  also  that 

maximum  edge  crossing  <  maximum  edge  length . 

In  addition,  the  crossing  number,  wire  area  and  maximum  edge  crossing  are 
worth  minimizing  independent  of  layout  area  and  maximum  edge  length 
considerations.  This  is  due  to  the  fact  that 

1)  chips  with  a  large  number  of  wire  crossings  (and,  in  particular,  those  with 
wires  which  cross  many  other  wires)  have  substantially  more  problems  with 
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capacitive  coupling  (i.e.,  interference  between  overlapping  wires)  than  do 
chips  with  fewer  crossings,  and 

2)  chips  with  high  wire  area  cost  more  and  experience  lower  yields  than  do 
chips  with  lesser  wire  area. 

Unfortunately,  the  results  of  Part  II  indicate  that  the  crossing  number  and  wire 
area  are  usually  as  large  (up  to  a  constant  factor)  as  the  layout  area.  In  addition, 
the  maximum  edge  crossing  is  often  nearly  as  large  as  the  side  length  of  the  chip. 
More  importantly,  however,  crossing  number  and  wire  area  arguments  can  be  used 
to  prove  better  lower  bounds  on  the  layout  area  and  maximum  edge  length  than 
were  possible  with  existing  techniques.  In  particular,  we  will  use  such  arguments 
to  find 

1)  an  7V-node  planar  graph  which  has  layout  area  Q(NlogN)  and  maximum 
edge  length  Q(N1/2/logl/2N), 

2)  an  iV-node  graph  with  an  0(W;/2)-separator  which  has  layout  area 
Q(Nlog2N)  and  maximum  edge  length  Q(Nl/2logN/IoglogN),  and 

3)  an  vV-node  graph  with  an  0(7Va)-separator  (for  a>l/2)  which  has  maximum 
edge  length  Q(Na). 

The  area  results  indicate  that  not  all  graphs  with  0(7V//2)-separators  (and,  in 
particular,  not  all  planar  graphs)  can  be  laid  out  in  linear  area,  thus  disproving  a 
popular  conjecture.  The  edge  length  bounds  indicate  that  layouts  of  certain 
networks  must  have  some  very  long  wires  (possibly  even  as  long  as  the  side  length 
of  the  layout).  Taken  together,  these  results  answer  all  of  the  previously  open 
questions  concerning  layout  area  and  maximum  edge  length  of  VLSI  networks 
with  known  separators. 


PART  I 


LAYOUTS  FOR  THE  SHUFFLE  •  EXCHANGE  GRAPH 


CHAPTER  1 


REVIEW  OF  KNOWN  LAYOUTS 


In  this  chapter,  we  review  the  known  layouts  of  the  shuffle-exchange  graph.  In 
section  1.1,  we  describe  Thompson’s  [T80]  straightforward  0(A’//og//?A)-area 
layout.  This  is  followed  in  section  1.2  by  a  detailed  description  of  Hoey  and 
Leiserson's  complex  plane  diagram.  The  complex  plane  diagram  is  very  helpful  in 
finding  good  layouts  for  the  shuffle-exchange  graph.  For  example,  Hoey  and 
Leiserson  [HL80]  have  used  the  diagram  to  find  an  0(N2/logN)-area  layout  for  the 
A-node  shuffle-exchange  graph.  In  Chapter  2,  we  will  use  the  diagram  to  find  a 
variety  of  layouts  for  the  A-node  shuffle-exchange  graph  including  one  which 
requires  only  0{N2/log*/2N)  area.  (Such  a  layout  has  also  recently  been  found 
independently  by  Steinberg  and  Rodeh  [SR80b].)  The  complex  plane  diagram  will 
also  be  used  in  Chapter  4  as  an  aide  in  the  construction  of  good  practical  layouts 
for  small  shuffle-exchange  graphs. 

1.1  Thompson's  Layout 

Thompson  was  the  first  to  investigate  VLSI  layouts  for  the  shuffle-exchange 
graph.  In  his  thesis  [T80],  he  showed  that  any  layout  for  the  A-node  shuffle- 
exchange  graph  requires  at  least  ^l{N2/lo^N)  area.  (We  reprove  this  fact  using 
crossing  number  arguments  in  Part  II  of  the  thesis.)  In  addition,  he  described  a 
layout  requiring  only  0(N2/logl/2N)  area.  In  what  follows,  we  present 
Thompson's  layout  and  give  a  simple  proof  that  it  does,  in  fact,  require  just 
0(N2/log,/2N)  area. 

Given  any  A-bit  string  w,  define  the  size  of  w  to  be  the  number  of  /-bits  it 
contains.  For  example,  the  size  of  WHO  is  3.  Thompson’s  idea  was  to  lay  out  the 
N=2k  nodes  of  the  shuffle-exchange  graph  on  a  straight  line  in  order  of 
nondecrcasing  size.  It  is  easily  seen  that  shuffle  edges  link  nodes  which  have  the 
same  size  and  that  exchange  edges  link  nodes  which  have  sizes  differing  by  one. 
Tims  the  edges  of  such  a  layout  are  relatively  short.  In  particular,  the  number  of 
horizontal  tracks  needed  to  embed  all  of  the  edges  is  at  most  0(  max  R.)  where 
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Bs  is  the  number  of  nodes  of  size  s.  This  is  due  to  the  fact  that  at  most 
0(Bs.j  +  Bs+  Bs+  f)  edges  can  cross  any  vertical  cut  of  the  layout  which  is  located 
between  a  pair  of  nodes  of  size  s. 

It  is  easy  to  show  that  Bs  =  C(k,s)  for  each  s  where 

C(k,s)  =  *!/[s!(*-5)!1 

is  the  well-known  function  for  binomial  coefficients.  It  is  also  well-known  that 
C(£,s)  achieves  its  maximum  value  at  s=  k/2  for  any  k.  Using  standard  asymptotic 
analysis,  it  is  easily  shown  that  C(k,k/2)  ~  Q{2k/k,/2)  for  large  k.  (For  a  good 
review  of  such  techniques,  see  Bender  and  Orszag’s  book  [B078].)  Thus 
Thompson’s  layout  requires  only  0(N/log,/2N)  horizontal  tracks.  Since  at  most  3 
vertical  tracks  are  needed  to  embed  the  vertical  portions  of  the  edges  incident  to 
any  given  node,  we  can  conclude  that  Thompson's  layout  has  area  Q(N2/logl/2N). 

1.2  Hocy  and  Lciserson’s  Complex  Plane  Diagram 

In  [HL80],  Hoey  and  Leiserson  observed  that  there  is  a  very  natural  embedding 
of  the  shuffle-exchange  graph  in  the  complex  plane.  In  what  follows,  we  describe 
this  embedding  (henceforth  referred  to  as  the  complex  plane  diagram)  and  point 
out  some  of  its  more  important  properties.  In  addition,  we  give  a  brief  description 
of  the  method  used  by  Hoey  and  Leiserson  to  transform  the  diagram  into  an 
0(/V2//og/V)-area  layout  for  the  N- node  shuffle-exchange  graph. 

1.2.1  Definition 

Let  8k  =  e2vi/k  denote  the  kih  primitive  root  of  unity.  Given  any  A:-bit  binary 
string  w  =  ak.t  .  .  •  a0  ,  let  p(w)  be  the  map  which  sends  w  to  the  point 

p(w)  -  +•••  +  aj8k  +  a0 

in  the  complex  plane.  As  each  node  of  the  (N=2*)-node  shuffle-exchange  graph 
corresponds  to  a  A-bit  binary  string,  it  is  possible  to  use  the  map  to  embed  the 
shuffle-exchange  graph  in  the  complex  plane.  For  example,  we  have  done  this  for 
the  J2-node  shuffle-exchange  graph  (whence  k=5)  in  Figure  1-1.  As  is  usual,  we 
have  drawn  the  shuffle  edges  with  solid  lines  and  the  exchange  edges  with  dashed 
lines.  For  simplicity,  each  node  is  labeled  with  its  value  instead  of  its  5-bit  binary 
string.  (By  the  value  of  a  node,  we  mean  the  numerical  value  of  the  associated 
Ac-bit  binary  string.) 
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Figure  1-1:  The  complex  plane  diagram  for  the  32-node 
shuffle-exchange  graph.  ( Taken  from  [HL80J.) 

1.2.2  Properties 

Examination  of  Figure  1-1  indicates  that  the  complex  plane  diagram  has  some 
very  interesting  properties.  First,  it  is  apparent  that  the  shuffle  edges  occur  in 
cycles  (which  we  call  necklaces)  which  are  symmetrically  placed  about  the  origin. 
This  phenomenon  is  easily  explained  by  the  following  identity: 

8kf(ak-r  * "  ao)  =  ak-fikk  +  ak-2Skk'1  +"•  +  a/8k2  +  ao&k 
=  ak-2&kkl  +'"  +  a0&k  +  ak-l 

~  P^ak-2  "a(flk-l)' 

Thus  traversal  of  a  shuffle  edge  corresponds  to  a  2ir/k  rotation  in  the  complex 
plane. 

Except  for  degenerate  cases,  the  preceding  identity  also  indicates  that  each 
necklace  is  composed  of  k  nodes,  each  a  cyclic  shift  of  the  other.  Such  necklaces 
are  called  full  necklaces.  Degenerate  necklaces  contain  fewer  than  k  nodes  and, 
because  they  must  have  some  symmetry,  are  mapped  entirely  to  the  origin  of  the 


complex  plane  diagram.  For  example,  {00000}  and  {0101,  1010}  are  degenerate 
necklaces  while  both  {101,  Oil,  110}  and  {11100,  11001,  10011,  00111,  OHIO}  are 
full. 

It  will  often  be  convenient  to  refer  to  a  necklace  by  one  of  its  nodes.  In 
particular,  we  will  use  the  notation  <iv>  to  indicate  the  necklace  generated  by  tv. 
This  is  simply  the  collection  of  cyclic  shifts  if  tv.  For  example,  the  necklace 
generated  by  101  is  <101>  =  {101,  Oil,  110}  . 

Exchange  edges  are  also  embedded  in  a  very  regular  fashion  by  the  complex 
plane  diagram.  In  fact,  each  exchange  edge  is  embedded  as  a  horizontal  line 
segment  of  unit  length.  This  phenomenon  is  explained  by  the  identity 

pi^k-l  •  *  •  1  —  I  ~t~ '  •  •  +  aj8k  1 

=  p(ak.j . .  •  ajl) . 

In  some  cases,  several  exchange  edges  are  contained  in  the  same  horizontal  line 
of  the  diagram.  Such  lines  are  called  levels.  For  example,  there  are  9  levels  in  the 
diagram  of  the  J2-node  shuffle-exchange  graph  shown  in  Figure  1-1.  We  will  use 
the  properties  of  levels  in  Chapter  2  to  find  an  0(A,2//ogJ/2A')-area  layout  for  the 
7V-node  shuffle-exchange  graph.  They  will  also  be  used  in  Chapter  4  to  find  good 
practical  layouts  for  small  shuffle-exchange  graphs. 

1.2.3  An  Q{N2 /log  N)- Area  Layout 

In  [HL80],  Hoey  and  Leiserson  showed  how  to  use  the  complex  plane  diagram 
to  construct  an  0(N2/logN)-area  layout  for  the  N-node  shuffle-exchange  graph. 
Their  method  was  very  involved,  however,  and  we  have  chosen  not  to  include  it 
here.  The  basic  idea  is  to  use  the  structural  properties  of  the  complex  plane 
diagram  to  find  an  0(N/log,/2N)-separatoT  for  the  Af-node  shuffle-exchange  graph 
whenever  N  is  of  the  form  2 2'  for  some  r>0.  The  separator  can  then  used  to 
construct  an  0(N2/logfJ)-aTta  layout  by  using  Leiserson’s  general  layout  technique 
for  graphs  with  known  separators  [L80a]. 

Shortly  after  writing  [HL80],  Hoey  and  Leiserson  found  a  far  simpler 
O(Af-//ogA0*area  layout  for  the  N- node  shuffle  exchange  graph  which  was,  in 
addition,  valid  for  all  N.  By  the  that  time,  however,  we  (as  well  as  several  others) 
had  also  observed  that  the  complex  plane  diagram  could  be  used  to  find  a  simple 
layout  for  the  shuffle-exchange  graph.  This  layout  is  described  in  Chapter  2. 


CHAPTER  2 


LAYOUTS  BASED  ON  THE  COMPLEX  PLANE  DIAGRAM 


In  this  chapter,  we  present  several  layouts  of  the  shuffle-exchange  graph  which 
are  based  on  Hoey  and  Leiserson's  complex  plane  diagram.  We  commence  in 
section  2.1  with  a  straightforward  CKA-V/ogAO-area  layout  of  the  A-node  shuffle- 
exchange  graph.  As  we  mentioned  in  Chapter  1,  this  layout  has  also  been 
discovered  by  many  others  (including  Hoey  and  Leiserson).  In  section  2.2,  we 
show  how  the  layout  can  be  modified  so  as  to  require  only  Q{N2/log?/2N)  area. 
The  latter  layout  was  also  discovered  independently  by  Steinberg  and  Rodeh 
[SR80b].  We  conclude  the  chapter  by  mentioning  an  additional  0(N2/log!/2N)- 
area  layout  as  well  as  a  layout  which  might  require  even  less  area. 

2.1  A  Straightforward  0(/VV/o£/V)-Area  Layout 

In  this  section,  we  describe  a  straightforward  layout  of  the  shuffle-exchange 
graph  which  requires  only  O (N2/logN)  area.  The  layout  is  formed  from  a  grid  of 
levels  and  necklaces  which  we  refer  to  as  the  level- necklace  grid.  Each  row  of  the 
grid  corresponds  to  a  level  of  the  complex  plane  diagram.  The  columns  are 
divided  into  consecutive  column  pairs,  each  pair  corresponding  to  a  necklace.  In 
particular,  the  leftmost  column  of  each  column  pair  corresponds  to  that  part  of  the 
necklace  which  is  contained  in  the  left  half  of  the  complex  plane.  Similarly,  the 
rightmost  column  corresponds  to  the  part  of  the  necklace  contained  in  the  right 
half  of  the  complex  plane.  We  assume  that  the  rows  are  ordered  from  top  to 
bottom  so  as  to  be  consistent  with  the  natural  ordering  of  the  levels  in  the  complex 
plane  but  (for  the  time  being)  place  no  restrictions  on  the  lefl-to-right  order  of  the 
necklaces. 

Each  node  of  the  shuffle-exchange  graph  is  placed  at  the  intersection  of  the  row 
and  column  of  the  grid  which  correspond  to  the  level  and  part  of  the  necklace  (left 
half  or  right  half)  to  which  it  belongs  in  the  complex  plane  diagram.  For  example, 
we  have  done  this  for  a  random  ordering  of  the  necklaces  of  the  J2-node  shuffle- 
exchange  graph  in  Figure  2-1. 
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Figure  2-1:  A  level- necklace  grid  for  the  32-node  shuffle- exchange  graph. 


Notice  that  we  used  just  one  vertical  track  to  embed  the  necklaces  <0>  and  <31> 
in  the  grid.  As  each  necklace  contains  just  one  node,  it  is  clear  that  this  is 
sufficient  In  general,  necklaces  which  are  mapped  to  the  origin  by  the  complex 
plane  diagram  are  a  nuisance  since  they  become  lumped  together  in  a  single  point 
of  the  level-necklace  grid.  Fortunately,  there  are  relatively  few  such  nodej.  In 
particular,  Hoey  and  Leiserson  showed  the  following. 

Lemma  2-1  (Hoey  and  Leiserson  [HL80]):  At  most  0(N/logN)  nodes  of  the  N- 
node  shuffle-exchange  graph  are  mapped  to  the  origin  of  the  complex  plane  diagram. 

Proof:  Every  node  which  is  mapped  to  the  origin  of  the  complex  plane  diagram 
is  adjacent  (via  an  exchange  edge)  to  a  node  at  position  (1,0)  or  (-1,0).  Any  node 
which  is  not  mapped  to  the  origin  is  contained  in  some  full  necklace,  at  most  two 
nodes  of  which  are  contained  in  positions  (1,0)  or  (-1,0).  Thus  for  every  pair  of 
nodes  which  are  mapped  to  the  origin,  there  are  at  least  k  =  logN  nodes  which 
are  not  mapped  to  the  origin.  Thus  at  most  0(N/k)  =  0(N/logN)  nodes  can  be 
mapped  to  the  origin  □ 

Since  at  most  0(N/logN)  nodes  are  mapped  to  the  origin,  we  can  (for  the  time 
being)  ignore  them.  They  can  always  be  inserted  later  at  a  cost  of  at  most 
0(A '/logN)  additional  vertical  and  horizontal  tracks.  Since  any  layout  of  the 
shuffle-exchange  graph  which  we  will  consider  will  have  at  least  Q(N/logN)  vertical 
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and  horizontal  tracks,  the  added  tracks  can  increase  the  area  of  the  final  layout  by 
at  most  a  constant  factor.  We  will  also  use  this  strategy  in  Chapter  3  when  we 
ignore  several  0(Ar//ogA/)-sized  sets  of  nodes. 

Since  each  full  necklace  contains  at  most  k  =  logN  nodes,  it  is  easy  to  see  that 
the  A- node  shuffle-exchange  graph  has  at  most  0(N/logN)  full  necklaces.  Thus  at 
most  0(N/logN)  vertical  tracks  are  needed  to  embed  all  of  the  shuffle  edges  in  the 
level-necklace  grid.  It  is  also  easy  to  show  that  at  most  N  horizontal  tracks  are 
needed  to  embed  all  of  the  exchange  edges  (one  track  is  used  for  each  exchange 
edge).  Thus  the  total  area  of  the  layout  for  the  TV- node  shuffle-exchange  graph  is 
0(N2/logN).  As  an  example,  we  have  added  the  edges  of  the  32-node  shuffle- 
exchange  graph  to  the  level-necklace  grid  in  Figure  2-1  to  produce  the  layout 
shown  in  Figure  2-2.  Note  that  we  have  omitted  <0>  and  <J/>  in  this  layout  since 
they  are  mapped  to  the  origin  of  the  complex  plane  diagram. 
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Figure  2-2:  Layout  produced  from  the  level- necklace  grid  shown  in  Figure  2-1. 


2.2  An  Improved  0(Ar-//o^/?AA-Area  Layout 

It  is  possible  to  improve  the  layout  described  in  section  2.1  by  reducing  the 
number  of  horizontal  tracks  needed  to  embed  the  exchange  edges.  This  can  be 
done  in  two  ways.  First,  exchange  edges  which  are  in  the  same  level  of  the 
complex  plane  diagram  but  which  do  not  overlap  in  the  level-necklace  grid  can  be 
inserted  on  the  same  horizontal  track.  As  more  exchange  edges  are  inserted  on  the 
same  track,  fewer  total  tracks  will  be  needed  to  embed  all  of  the  exchange  edges. 
Secondly,  the  necklaces  can  be  re-ordered  so  as  to  increase  the  average  number  of 
exchange  exchange  edges  which  can  be  inserted  on  each  horizontal  track. 

Although  we  do  not  know  how  to  best  order  the  necklaces  in  general,  we  have 
found  several  orderings  which  yield  0{N2/log2/2N)-zrt<i  layouts  for  the  A- node 
shuffle-exchange  graph.  For  instance,  we  will  show  in  what  follows  that  such  a 
layout  can  be  constructed  by  arranging  the  necklaces  from  left  to  right  in  order  of 
nondecreasing  size.  (The  size  of  a  necklace  is  simply  defined  to  be  the  size  of  any 
of  its  nodes.)  This  observation  has  also  been  made  by  Steinberg  and  Rodeh  in 
[SR80b]. 

In  order  to  bound  the  number  of  horizontal  tracks  needed  to  insert  the  exchange 
edges,  we  w  ill  show  that  the  maximum  overlap  of  exchange  edges  on  each  level 
occurs  in  between  necklaces  of  size  k/2.  Since  the  maximum  overlap  of  exchange 
edges  on  each  level  is  an  upper  bound  on  the  number  of  horizontal  tracks  needed 
to  insert  the  exchange  edges  on  that  level,  we  can  thus  conclude  that  the  total 
number  of  horizontal  tracks  needed  to  insert  all  of  the  exchange  edges  is  at  most 

0(5^)  ~  O (N/log,/2N)  . 

Thus  the  resulting  layout  will  have  area  at  most  0(N2/log3/2N). 

It  is  not  immediately  clear  why  the  maximum  overlap  on  each  level  occurs 
between  nodes  of  size  k/2,  however.  In  what  follows,  we  break  up  each  level  into 
sublevels  (for  which  the  analysis  is  easier)  and  show'  that  the  maximum  overlap  on 
each  sublevel  occurs  between  necklaces  of  size  k/2.  Before  doing  this,  however,  we 
must  introduce  some  further  notation. 

Consider  a  node  of  the  form  £*./•• for  which  either  ak.{=0  or  a{=0  or 
both  for  each  i<k.  We  will  refer  to  such  a  node  as  basis  node.  A  node 
b/../  •  •  •  b0  is  said  to  be  generated  by  the  basis  node  ak.r  •  -a0  if 
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1)  b^-a^  and  6/='^  whenever  f°r  ^  ^ •  and 

2)  bk.i-bi  whenever  oA.(=fl(=d  for  7  <  /  <  £ . 

For  example,  70000  generates  7000/.  7/700  and  77707  but  not  77777. 

It  is  not  difficult  to  show  that  if  u  generates  v,  then  both  u  and  v  are  on  the  same 
level  of  the  complex  plane  diagram.  For  example,  let  u  =  a k.t-  •  ■ a0  and 
v  =  bk.j  •  •  •  b0  and  observe  that 

piy)  *  M  =  (bk.,  -  ahl)  8khI  +  ---  +  (brai)8k  +  (b0  -  a0) 

=  c^Sf1  +-•■  +  c]8k  +  c0 

where  c*_f=c,  for  each  i,  1  <  i  <  k  .  Since  8kk‘‘  is  the  complex  conjugate  of 
8k‘  for  7  <  /  <  k  ,  we  can  conclude  that  p(v)  -  p{u)  is  a  real  number  and  thus 
that  u  and  v  are  in  the  same  level  of  the  complex  plane  diagram. 

It  is  also  easy  to  show  that  each  node  of  the  shuffle-exchange  graph  is  generated 
by  a  unique  basis  node.  In  particular,  the  node  which  generates  bk.t-  •  •  b0  can 
be  found  by 

1)  setting  b0=0  and  (if  k  is  even)  setting  b^-O,  and 

2)  setting  b^b^O  for  each  /  such  that  (originally)  ^/=^.,=  7. 

Since  exchange  edges  link  nodes  which  are  in  the  same  sublevel,  we  can 
conclude  from  the  preceding  arguments  that  it  is  possible  to  partition  each  level  of 
the  complex  plane  diagram  into  sublevels  so  that  the  nodes  in  each  sublevel  are 
precisely  the  nodes  generated  by  some  basis  node.  We  will  now  show  that  the 
maximum  overlap  at  each  sublevel  occurs  between  necklaces  of  size  k/2. 

Since  the  necklaces  have  been  arranged  from  left  to  right  in  order  of 

nondecreasing  size,  we  can  use  arguments  similar  to  those  of  section  1.1  to 

conclude  that  the  overlap  of  exchange  edges  between  two  nodes  of  size  s  in  any 
sublevel  is  at  most  0(  max  B')  where  B'  is  the  number  of  nodes  in  that 
sublevel  with  size  s.  A  straightforward  counting  argument  shows  that  each  basis 
node  of  size  r  generates 

1)  C{k/2  -  r,  /)  nodes  of  size  s=r+2i  for  any  /  <  k/2  -  r  ,  and 

2)  C(lo  2  -  t,  i )  nodes  of  size  s=r+2i+ 7  for  any  /  <  k/2  *  r 


when  k  is  odd,  and 

1)  C(k/2  -  r  -  1.  A  +  C(k/2  -  r  -  /,  /  -  /)  =  C(k/2  -  r,  i)  nodes  of  size 

s-r-h2i  for  any  i  <  k/2  -  r  ,  and 

2)  2C(k/2  -  r  -  1,  i)  nodes  of  size  s=  r+  2i+ 1  for  any  i  <,  k/2  -  r  -  / 

when  k  is  even.  We  can  therefore  conclude  that  in  all  cases,  the  maximum  value 
of  Bs '  occurs  when  i  =  (k  •  2r)/H  and  thus  when  s=  k/2.  This  concludes  the 
proof. 

As  an  example,  we  have  drawn  such  a  layout  for  the  J2-node  shuffle-exchange 
graph  in  Figure  2-3.  Note  that  far  fewer  horizontal  tracks  are  needed  for  this 
layout  than  are  used  for  the  layout  in  Figure  2-2.  For  completeness,  we  have 
included  the  necklaces  <0>  and  <31>  even  though  they  are  degenerate. 


necklaces 


<0>  <1>  <3>  <5>  < 7 >  <11>  <  1 5x 31  > 


levels 


Tie  719 


[30  I  31 
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Figure  2-3:  An  improved  layout  fur  the  3  2- node  shuffle- exchange  graph. 
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2.3  Other  Layouts 
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It  is  not  difficult  to  find  other  orderings  of  the  necklaces  which  produce 
0 ( N2/ log2/'2N)- area  layouts  for  the  A-node  shuffle-exchange  graph.  For  example, 
Lepley  [LLM81]  used  standard  statistical  methods  to  show  that  the  arrangement  of 
necklaces  from  left  to  right  in  order  of  nondecreasing  radius  produces  such  a 
layout.  (By  the  radius  of  a  necklace ,  we  mean  the  radius  of  the  circle  in  the 
complex  plane  which  contains  the  necklace.)  The  proof  is  similar  to  the  one  in 
section  2.2.  In  particular,  it  is  shown  that  the  maximum  overlap  in  most  levels 
occurs  in  the  same  place  and  that  the  total  overlap  of  all  of  the  levels  at  that  point 
is  Q(N/logI/2N). 

Although  we  consider  it  likely  that  better  orderings  of  the  necklaces  exist,  we  do 
not  know  of  any  ordering  which  (provably)  results  in  a  layout  with  less  than 
o {N2/log3/2N)  area.  There  is  another  ordering  of  interest,  however.  That  is  the 
ordering  of  the  necklaces  according  to  the  minimal  number  represented  by  each 
necklace.  (The  minimum  number  represented  by  a  necklace  is  simply  the  smallest 
value  of  any  node  in  the  necklace.)  Coincidentally,  the  layout  displayed  in  Figure 
2-3  has  such  an  ordering.  Using  techniques  which  are  developed  in  Chapter  3,  it  is 
possible  to  show  that  the  combined  maximum  overlap  of  exchange  edges  in  all 
levels  is  at  most  0{NloglogN/logN)  for  this  ordering.  This  is  substantially  better 
than  the  0 (N/log,/2N)  overlap  found  in  previous  orderings  and  also  very  close  to 
the  lower  bound  of  Sl(N/logN).  Unfortunately,  we  do  not  know  how  to  show  that 
the  maximum  overlap  at  each  level  occurs  in  the  same  place.  In  fact,  it  appears 
that  this  may  not  be  the  case.  (We  are  deeply  indebted  to  Kleitman  for  pointing 
out  the  possibility  of  such  an  improvement.  Although  we  were  not  able  use  his 
idea  in  the  context  of  complex  plane  diagram  layouts,  it  was  crucial  to  the 
development  of  the  asymptotically  optimal  layout  described  in  Chapter  3.) 

For  orderings  which  have  a  small  combined  maximal  overlap  but  for  which  the 
maximal  overlap  at  each  level  is  difficult  to  compute  (such  as  the  ordering  by 
minimal  value  represented),  it  may  be  possible  to  improve  the  situation  by  altering 
the  level  structure.  As  Miller  pointed  out  to  us,  there  are  many  possible  levelings 
of  the  exchange  edges.  (By  a  leveling ,  we  mean  any  arrangement  of  the  exchange 
edges  in  levels  which  is  consistent  with  the  necklace  structure  of  the  complex  plane 
diagram.)  Although  we  have  investigated  several  levelings,  we  have  not  found  any 
(provably)  better  layouts  for  the  shuffle-exchange  graph  by  this  method. 
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CHAPTER  3 


MORE  SOPHISTICATED  LAYOUTS 


In  section  3.3  of  this  chapter,  we  describe  an  asymptotically  optimal 
0(Av7og-A)-area  layout  for  the  A-node  shu file-exchange  graph.  Unlike  the 
previously  described  layouts,  the  optimal  layout  is  fairly  sophisticated  and  requires 
a  substantial  amount  of  preliminary  machinery.  Most  of  the  necessary  definitions 
and  lemmas  are  included  in  section  3.1.  In  section  3.2,  we  describe  and  analyze  a 
near-optimal  preliminary  version  of  the  optimal  layout.  The  optimal  layout  is  then 
described  in  section  3.3.  In  section  3.4,  we  extend  the  methods  developed  in  earlier 
sections  in  order  to  show  that  certain  useful  supergraphs  of  the  A-node  shuffle- 
exchange  graph  can  also  be  laid  out  in  0(N2/log2N)  area.  We  have  also  included 
an  appendix  to  the  chapter  in  which  we  prove  Lemmas  3-1  through  3-4. 

3.1  Preliminaries 

The  layouts  described  in  this  chapter  are  based  on  some  important  combinatorial 
properties  of  strings  which  contain  long  blocks  of  consecutive  zeros.  Before 
describing  tire  layouts,  however,  it  is  useful  to  review  some  of  these  properties.  In 
this  section,  we  mention  several  combinatorial  lemmas  and  definitions  which  will 
be  heavily  used  in  the  analysis  which  follows  later.  As  the  proofs  of  the  lemmas 
are  somewhat  complicated,  they  have  been  included  in  the  appendix. 

In  what  follows,  we  will  be  particularly  interested  in  the  size  and  location  of  the 
longest  block  of  consecutive  0-bits  in  the  A-bit  binary  string  associated  with  each 
node.  In  order  that  the  size  of  this  block  be  the  same  for  all  nodes  within  a 
necklace,  we  allow  blocks  to  begin  at  the  end  and  end  at  the  beginning  of  a  string. 
For  example,  the  longest  block  of  zeros  in  the  string  01010  starts  at  the  fifth  bit  and 
has  length  two. 

Let  'Vk(()  denote  the  number  of  A-bit  strings  for  which  the  longest  block  of 
consecutive  zeros  has  length  t.  For  example,  'Vj(2)=3.  The  following  combina¬ 
torial  lemma  provides  a  good  asymptotic  bound  on  the  growth  of  ’!'*(/). 


Lemma  3*1:  For  (logic)/ 2  +  login k  <  t  «  k  and  k-*oo. 

In  order  to  illustrate  the  important  features  of  the  function  in  Lemma  3-1,  we 
have  sketched  a  graph  of  T^ift)  versus  /  in  Figure  3-1.  The  maximum  of 
Tk^^(i)  occurs  at  /  =  logk-1  whence 

Ik*k(t)  =  (e,/2-J)/e 

s  .23865 . 

For  /  >  logk  -  I,  2rH'k(t)  decreases  exponentially  as  /  increases.  For  t  <  logk  - 1, 
TkyV/c(i)  decreases  doubly  exponentially  as  t  decreases. 


Figure  3-1 :  Density  of  k-bit  binary  strings  for  which  the 
longest  block  of  consecutive  zeros  has  length  t. 


Roughly  speaking,  Lemma  3-1  states  that  the  longest  block  of  consecutive  zeros 
in  nearly  1/4  of  all  A-bit  strings  has  length  precisely  logk  -  1.  Further,  there  are 
not  many  strings  of  length  k  with  substantially  more  than  logk  consecutive  zeros 
and  even  fewer  strings  for  which  the  longest  block  of  consecutive  zeros  has  length 
substantially  less  than  logk.  This  information  is  further  quantified  in  the  following 
lemma. 


Lemma  3*2:  The  number  of  k-bit  strings  for  which  the  longest  block  of 
consecutive  zeros  has  length  less  than  logk  -  loglnk  -  /  or  length  greater  than  2logk 


is  at  most  O (2k/k)  =  0(N/logN)  . 

As  we  mentioned  in  Chapter  2,  we  may  ignore  O(A,//ogA0*sized  sets  of  nodes 
which  have  undesirable  properties.  As  such  nodes  can  be  inserted  with  the 
addition  of  at  most  0(N/logN)  vertical  and  horizontal  tracks,  we  can  always  add 
them  later  without  increasing  the  total  area  by  more  than  a  constant  factor.  By 
Lemma  3-2,  we  can  thus  henceforth  consider  only  those  nodes  for  which  the 
longest  block  of  zeros  has  length  between  logk  -  loglnk  -  1  and  2logk. 

We  will  also  be  interested  in  the  size  of  the  second  longest  block  of  consecutive 
zeros  in  each  string.  Usually,  the  size  of  the  second  longest  block  of  zeros  will  be 
very  close  to  the  size  of  the  longest  block  of  zeros.  We  state  this  observation  more 
precisely  in  the  following  lemma. 

Lemma  3-3:  The  sum  oxer  all  necklaces  of  the  difference  in  length  between  the 
longest  and  second  longest  blocks  of  consecutive  zeros  is  at  most  0(N/logN). 

Using  information  about  the  size  and  location  of  blocks  of  zeros  within  the 
necklace,  it  is  possible  to  distinguish  one  particular  node  in  the  necklace.  More 
precisely,  we  define  the  distinguished  node  of  a  necklace  to  be  the  node  containing 
the  longest  leading  block  of  zeros.  For  example,  00101  is  the  distinguished  node  of 
<010J0>.  Should  two  or  more  nodes  of  a  necklace  begin  with  equal  and  maximal 
length  blocks  of  zeros,  then  each  node  of  the  necklace  contains  at  least  two  blocks 
of  zeros  of  maximal  length.  In  such  cases,  we  distinguish  that  node  for  which  the 
leading  block  of  zeros  is  maximal  and  for  which  the  second  occurence  of  a 
maximal  length  block  of  zeros  is  as  near  as  possible  to  the  beginning  of  the  string. 
For  example,  01011  (not  01101)  is  the  distinguished  node  of  the  necklace  <10101>. 
For  some  necklaces,  such  as  <///>  and  <1010101>,  there  is  no  uniquely 
distinguished  node.  As  we  show'  in  the  following  lemma,  such  necklaces  are 
sufficiently  rare  that  we  need  not  consider  them  further. 

Lemma  3-4:  At  most  0{N/logN)  nodes  are  contained  in  necklaces  which  fail  to 
have  a  uniquely  distinguished  node. 

We  refer  to  the  leading  block  of  zeros  of  a  distinguished  node  as  the  primary 
block  of  zeros.  If  a  distinguished  node  has  two  or  more  maximal  length  blocks  of 
zeros,  then  the  maximal  length  block  following  the  primary  block  is  referrred  to  as 
the  secondary  block  of  zeros.  These  definitions  can  be  easily  extended  to  any  node 
contained  in  a  necklace  which  has  a  uniquely  distinguished  node.  For  example. 
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the  primary  block  of  zeros  of  01010  starts  in  the  fifth  bit  and  has  length  two.  Note 
that  this  string  does  not  have  a  secondary  block  of  zeros.  As  another  example,  we 
note  that  the  secondary  block  of  zeros  in  the  string  1 1010  consists  solely  of  the  fifth 
bit.  Note  that  the  secondary  block  of  zeros  (if  it  exists)  always  has  the  same  length 
as  the  primary  block  of  zeros. 

If  the  last  bit  of  a  node  occurs  in  the  primary  block  of  zeros,  we  call  that  node  a 
primary1  node.  Similarly,  if  the  last  bit  of  a  node  occurs  in  the  secondary  block  of 
zeros,  we  call  the  node  a  secondary/  node.  For  example,  101 10  is  a  primary  node, 
11010  is  a  secondary  node  and  10010  is  neither  primary  nor  secondary. 

Note  that  all  primary  and  secondary  nodes  are  necessarily  even.  (We  say  that  a 
node  is  even  if  its  last  bit  is  0  and  odd  if  its  last  bit  is  /.)  Note  also  that,  by  Lemma 
3-2,  we  need  only  consider  necklaces  which  contain  between  logk  -  loglnk  - 1  and 
2logk  primary  nodes.  Such  necklaces  will  also  have  at  most  2logk  secondary 
nodes. 

In  what  follows,  we  will  represent  nodes  in  terms  of  their  corresponding 
distinguished  nodes.  More  precisely,  we  use  the  notation  ak.j>  •  •ai+japi.f  •  >a0 
to  denote  the  node  ah]-  •  •  0^./ •  •  -  a,  .  For  example,  00101  denotes  the  node 
10010.  Using  this  notation,  a  primary  node  has  the  form  0. .  -W-  •  .0iv  while  a 
secondary  node  has  the  form  0-  •  -0w’0*  •  -  O’.  •  -Ow”  where  0-  .  -Ow  and 
0 ■  •  -Ow'O-  •  •Ow”  are  assumed  to  be  distinguished  nodes. 

3.2  A  Near-Optimal  Layout 

We  are  now  prepared  to  describe  a  near-optimal  preliminary  version  of  the 
optimal  layout.  In  section  3.3,  we  will  show  how  to  modify  this  layout  in  order  to 
construct  an  optimal  0(A‘//og^V)-area  layout  for  the  N- node  shuffle-exchange 
graph. 

3.2.1  Locution  of  till  Nodes 

The  near-optimal  layout  is  constructed  from  a  logN  x  0{N/logN)  grid  of 
nodes.  Each  column  of  the  grid  corresponds  to  a  necklace  of  the  shuffle-exchange 
graph.  The  nodes  of  each  necklace  are  ordered  from  top  to  bottom  so  that  the  ith 
node  is  a  left  cyclic  shift  of  the  (i-l)st  node  for  each  /  and  so  that  the  distinguished 
node  is  placed  in  the  bottom  row.  The  necklaces  are  ordered  from  left  to  right  so 
that  the  values  of  the  distinguished  nodes  form  an  increasing  sequence.  For 
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example,  we  have  constructed  such  a  grid  for  the  J2-node  shuffle-exchange  graph 
in  Figure  3-2.  In  the  figure,  we  have  represented  each  node  in  terms  of  the 
associated  distinguished  node.  This  representation  readily  illustrates  the  fact  that 
the  last  bit  of  any  node  in  the  ith  row  corresponds  to  the  ith  bit  of  the  associated 
distinguished  node.  Note  that  the  necklaces  <00000>  and  </////>  have  not  been 
included  since  they  are  degenerate. 
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Figure  3-2:  The  grid  of  nodes  for  the  32-node  shuffle- exchange  graph 


3.2.2  Insertion  of  the  Edges 

It  is  easily  observed  that  die  shuffle  edges  can  be  inserted  in  the  grid  with  the 
addition  of  0(N/logN)  vertical  and  2  horizontal  tracks.  In  the  following,  we  will 
show  that  the  exchange  edges  can  be  inserted  with  the  addition  of 
0(NloglogN/logN)  vertical  and  horizontal  tracks.  Thus  the  total  area  of  the  layout 
is  0( /V- ( loglogN)-/log2N).  This  is  only  a  factor  of  Q(iloglogN)2)  off  from  the 
lower  bound  of  0(N2/log2N). 

The  analysis  is  divided  into  two  parts.  In  part  (a),  we  show  that  only 
0( NloglogN/logN)  exchange  edges  link  nodes  which  are  in  different  rows  of  the 
grid.  Thus  such  edges  can  be  inserted  with  the  addition  of  at  most 
0( NloglogN/logN)  vertical  and  horizontal  tracks.  In  part  (b),  we  conclude  the 
analysis  by  showing  that  at  most  Q(N/logK)  horizontal  tracks  are  needed  to  insert 
the  exchange  edges  which  link  two  nodes  in  the  same  row. 
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(a)  Exchange  Edges  Which  Link  iNodes  in  Different  Hows 

Consider  an  exchange  edge  which  links  two  nodes  that  are  in  different  rows  of 
the  grid.  In  particular,  assume  that  the  edge  is  incident  to  an  even  node  in  the  ilh 
row  for  some  /.  By  definition,  the  even  node  can  be  represented  as  \\€w'  where 
| u j  =  /■  /  and  ut9u> '  is  the  distinguished  node  of  <h0h',>.  The  exchange  edge  is 
also  incident  to  the  odd  node  w7w\  By  assumption,  wlw'  is  not  located  in  the  ilh 
row  and  thus  wlw'  is  not  a  distinguished  node.  Since  w Ow'  is  a  distinguished 
node,  we  know  that  the  itli  bit  of  \\Dw'  (tlie  bit  that  was  changed  in  order  to 
produce  \vT\\>')  must  be  in  the  primary  or  secondary  block  of  zeros  of  w Ow' . 
Otherwise,  the  primary  and  (if  it  exists)  secondary  blocks  of  zeros  of  wlw'  would 
be  identical  in  location  and  size  to  the  primary  and  secondary  blocks  of  wOw ' . 
This  would  imply  that  wlw'  is  also  distinguished,  a  contradiction.  Thus  wOw' 
must  be  a  primary  or  secondary  node.  As  was  previously  mentioned,  we  can 
assume  that  each  necklace  has  at  most  2logk  =  2loglogN  primary  and  2loglogN 
secondary  nodes.  Thus  at  most  4loglogN  nodes  in  each  necklace  are  both  even  and 
incident  to  an  exchange  edge  which  links  nodes  in  different  rows.  Since  every 
exchange  edge  is  incident  to  an  even  node  and  since  there  are  0(N/logN) 
necklaces,  we  can  conclude  that  there  are  at  most  0{Nlug!ogN/logN)  exchange 
edges  which  link  nodes  in  different  rows. 

(b)  Exchange  Edges  W'ltich  Link  Nodes  in  Hie  Same  Row 

We  next  show  that  those  exchange  edges  which  link  two  nodes  that  are  in  the 
same  row'  can  be  inserted  with  the  addition  of  at  most  0(N/logN)  horizontal  tracks. 
Once  again,  the  analysis  is  divided  into  two  parts.  In  the  first  part,  we  show  that  at 
most  0(N/logN)  exchange  edges  are  contained  in  the  first  logk  row's.  Such  edges 
can  be  trivially  inserted  with  the  addition  of  0{N/logN)  horizontal  tracks.  In  the 
second  part,  we  show  that  only  2k~'  horizontal  tracks  are  needed  to  insert  the 

K 

exchange  edges  in  the  ilh  row  for  any  /  >  logk.  Since  2  2k''  <  2k/k  = 
N/logN  ,  this  will  be  sufficient  to  show  that  at  most  Q(N/!ogN)  additional 
horizontal  tracks  are  necessary  to  insert  the  remaining  exchange  edges. 

Consider  a  necklace  which  has  i  primary  nodes  for  some  i<logk.  By  definition, 
the  nodes  in  the  first  t  rows  of  such  a  necklace  are  all  even.  Thus,  such  a  necklace 
can  have  at  most  r  =  logk  -  /  odd  nodes  in  the  first  logk  rows.  By  Lemma  3*1, 
we  know'  that  there  are 


such  necklaces  for  ( logk)/2+loglnk  <  / «  k  .  By  Lemma  3*2,  we  can  assume  that 
t  >  logk  -  loglnk  -  1  and  thus  the  total  number  of  odd  nodes  occurring  in  the  first 
logk  rows  is  at  most 

~  ' % ( logk  -  /) (2k/k) {ek2't2 -  ek2'tl ) 

r*o 

lodrilH 

= 

rtp 

<  (2k/k)  |l  ef2 

=  0(N/logN) . 

Since  every  exchange  edge  is  incident  to  an  odd  node,  the  above  bound  implies 
that  at  most  0(A VlogN)  exchange  edges  are  contained  in  the  first  logk  rows. 

We  next  consider  the  number  of  horizontal  tracks  necessary  to  insert  the 
exhange  edges  contained  in  the  ///;  row  for  i>logk.  This  number  is  identical  to  the 
maximum  number  of  exchange  edges  that  can  overlap  each  other  at  a  single  point 
of  the  ith  row.  In  Figure  3-3,  we  illustrate  the  necessary’  conditions  for  two 
exchange  edges  to  overlap  in  the  ith  row.  All  representations  are  in  terms  of 
distinguished  nodes. 


r 


level  i 


■ 


V. 


» - • 

wOw"  wlw " 


£ - • 

wOw'  wlw ' 


• - • 

wOw  wlw 

|w|  =  i-1  w <  w'  <  w" 


Figure  3*3:  Necessary  conditions  for  exchange  edges  to  overlap  in  the  ith  row. 


Note  that  the  even  end  of  an  exchange  edge  is  always  to  the  left  of  the  odd  end. 
Also  note  that  any  node  which  occurs  between  wOw'  and  wlw'  must  be 
represented  as  wOw"  where  w">w'  or  as  »v/»v'"  where  h’"Kw'\  In  either  case,  the 
exchange  edge  incident  to  the  overlapped  node  extends  beyond  the  exchange  edge 
linking  wOw'  to  wlw' .  Since  there  are  at  most  2k'1  - 1  nodes  between  \\0w'  and 
w7w'  ,  these  facts  imply  that  at  most  2k‘‘  exchange  edges  can  overlap  at  any  point 
of  the  ith  row.  This  observation  completes  the  argument  that  the  near  optimal 
layout  requires  only  0{N\loglogN)2/log2N)  area. 

3.3  An  Optimal  0(A^//o£2A)-Area  Layout 

In  this  section,  we  will  modify  the  layout  described  in  section  3.2  in  order  to 
produce  an  optimal  0{N2/log2N)-arta  layout  for  the  TV-node  shuffle-exchange 
graph.  In  particular,  we  will  relocate  the  primary  and  secondary  nodes  of  each 
necklace  so  that  they  are  closer  to  and  in  the  same  row  as  the  nodes  to  which  they 
are  linked  via  an  exchange  edge.  Before  going  into  the  details  of  this  relocation, 
however,  it  is  necessary  to  introduce  some  additional  terminology. 

3.3.1  Moie  Definitions 

In  order  to  construct  an  optimal  layout  for  the  shuffle-exchange  graph,  we  have 
found  it  necessary  to  break  up  each  necklace  into  two  or,  possibly,  three  pieces. 
The  basic  piece  of  each  necklace  consists  of  all  those  nodes  which  are  neither 
primary  nor  secondary.  The  primary  piece  of  each  necklace  consists  of  the  primary 
nodes  while  the  secondary  piece  consists  of  the  secondary  nodes  (if  there  are  any). 
For  example,  the  basic  piece  of  <0101 1>  is  {01011,  01011,  01011},  the  primary 
piece  is  {01011},  and  the  secondary  piece  is  {0/07/}. 

It  is  also  necessary  to  extend  the  notion  of  a  distinguished  node  to  include  pieces 
of  necklaces.  The  distinguished  node  of  a  basic  piece  is  the  same  as  the 
distinguished  node  of  the  associated  necklace.  The  distinguished  node  of  a  primary 
piece  of  a  necklace  is  that  node,  of  the  necklace  which  becomes  distinguished  when 
we  ignore  the  primary  block  of  zeros  (i.e.,  when  we  temporarily  replace  the 
primary  block  of  zeros  in  each  node  of  the  necklace  with  an  equal-length  block  of 
ones).  Similarly,  the  distinguished  node  of  a  secondary  piece  of  a  necklace  is  that 
node  which  becomes  distinguished  when  we  ignore  the  secondary  block  of  zeros. 
For  example,  OWHOllI  is  the  distinguished  node  of  the  basic  piece  of 
<0101/01 1 />,  01 1011 101  is  the  distinguished  node  of  the  primary  piece,  and 


011101011  is  the  distinguished  node  of  the  secondary’  piece.  Note  that  the 
distinguished  nodes  of  the  primary  and  secondary  pieces  of  any  necklaces  are 
necessarily  odd  nodes  and  thus  are  contained  in  the  basic  piece  of  the  necklace. 

It  is  important  to  note  that  some  necklaces  (such  as  <0111/>)  have  a 
distinguished  node  but  do  not  have  a  distinguished  node  for  the  primary  or 
secondary  piece  of  the  necklace.  Fortunately,  arguments  such  as  those  used  to 
prove  Lemmas  3-3  and  3-4  can  be  used  to  show  that  at  most  0{N/logN)  nodes  are 
contained  in  such  necklaces.  Thus,  we  can  assume  henceforth  that  every  piece  of 
every  necklace  has  an  associated  distinguished  node. 

3.3.2  Location  of  the  Nodes 

As  in  section  3.2.  the  layout  is  constructed  from  a  IcgN  x  0 (N/lugN)  grid  of 
nodes.  Each  column  of  the  grid  corresponds  to  a  piece  of  a  necklace.  The  nodes 
of  each  piece  are  arranged  within  a  column  so  that  a  node  of  the  form 
ak-r  ‘  ‘°k-i’  *  'au  ("'here  ak.j-  •  -a0  is  assumed  to  be  the  distinguished  node  of 
the  associated  piece)  is  placed  in  the  ith  row  of  the  grid.  Note  that  nodes  in  the 
basic  piece  of  any  necklace  (these  include  all  odd  nodes)  are  in  the  same  row  as 
they  were  in  the  near-optimal  layout  described  in  section  3.2.  The  columns  are 
ordered  from  left  to  right  so  that  the  values  of  the  distinguished  nodes  of  the 
associated  pieces  form  a  nondecreasing  sequence.  For  example,  we  have 
constructed,  such  a  grid  for  k~5  in  Figure  3-4. 
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Note  that  the  necklaces  <0000 D,  <000 Ii>,  <001  //>,  and  <0U11>  have  not  been 
included  in  Figure  3*4  since  their  associated  primary  pieces  do  not  have 
distinguished  nodes. 

3.3.3  Insertion  of  the  Edges 

As  each  necklace  is  broken  up  into  at  most  four  contiguous  pieces  in  the 
modified  grid  (the  basic  piece  may  have  been  broken  up  into  two  contiguous 
pieces),  the  shuffle  edges  can  be  inserted  with  the  addition  of  at  most  0 {N/logN) 
vertical  and  horizontal  tracks.  In  what  follows,  we  will  show  that  at  most 
O {N/logN)  vertical  and  horizontal  tracks  are  needed  to  insert  all  of  the  exchange 
edges  as  well.  Thus  the  area  of  the  layout  will  be  0(/VV/og-.V),  w  hich  is  optimal. 

As  before,  we  di\ ide  the  analysis  of  the  exchange  edges  into  two  parts.  We  first 
show  that  at  most  0 (N/logi's)  exchange  edges  link  nodes  which  are  in  different 
rows  of  the  grid.  Such  edges  can  thus  be  trivially  inserted  with  the  addition  of  at 
most  0 (N/logN)  vertical  and  horizontal  tracks.  We  then  show  that  those  exchange 
edges  which  link  two  nodes  in  the  same  row  can  be  inserted  with  the  addition  of 
only  0(N/logN)  horizontal  tracks.  The  arguments  will  be  very  similar  to  those  in 
section  3.2.2. 

(a)  Exchange  Edges  Which  Link  Nodes  in  Different  Rows 

Consider  an  exchange  edge  which  links  two  nodes  which  are  in  different  rows  of 
the  grid.  Since  only  primary  and  secondary  nodes  have  been  relocated,  we  can 
conclude  from  the  arguments  of  section  3.2.2a  that  the  even  node  which  is  incident 
to  the  edge  is  either  a  primary  or  secondary  node.  In  what  follows,  we  will  show 
that  the  even  node  is,  in  fact,  a  primary  node. 

Assume  for  the  purposes  of  contradiction  that  the  even  node  is  a  secondary 
node.  Then  this  node  can  be  represented  as  wOw*  where  wOw1  is  the  distinguished 
node  of  the  secondary  piece  of  <u0»v'>  and  (u(  =  /-/  for  some  i.  By  definition, 
wOw*  is  located  in  the  iih  row  of  the  grid  and  is  linked  to  wlw'  via  the  exchange 
edge.  Since  wlw'  is  odd,  it  is  contained  in  the  basic  piece  of  < tv/ u  ’  >.  By 
assumption,  wlw’  is  not  also  in  the  ith  row  and  thus  whv*  cannot  be  the 
distinguished  node  ol  <n/n  '>.  Since  the  lengths  of  the  two  blocks  of  zeros  in 
w/w'  created  by  switching  the  iih  bit  from  0  to  /  arc  less  than  the  length  of  the 
primary  block  of  zeios  (in  fact,  the  sum  of  their  lengths  is  precisely  one  less  than 
the  length  of  the  primary  block),  wlw'  will  be  the  distinguished  node  cf  <wlw"> 


precisely  when  wOw' is  the  node  distinguished  in  <h Ow'>  by  ignoring  the 
secondary  block  of  zeros.  By  definition,  this  is  the  case  precisely  when  wOw'  is  the 
distinguished  node  of  the  secondary  piece  of  <wOw'>.  By  assumption,  wOw'  is  the 
distinguished  node  of  the  secondary  piece  of  <»e0»v’>  and  thus  we  can  conclude 
that  wlw'  is  the  distinguished  node  of  <wlw'>,  a  contradiction. 


Next  consider  a  primary  node  which  is  incident  to  an  exchange  edge  linking  two 

nodes  in  different  rows  of  the  grid.  By  the  preceding  arguments,  this  node  must  be 
Jl  f2 

of  the  form  wlO  •  •  •  000~^~n)lw'  where  wlO  •  •  •  01  w'  is  the  distinguished 

node  of  the  primary  piece  of  <wlO  •  •  •  01  w'  >  and  either  tj  or  t2  is  larger  than  or 

equal  to  the  length  of  the  longest  block  of  zeros  in  wllw'.  Otherwise, 
h  J2 

wlO  •  •  •  010  •  •  •  Ohv'  would  (by  definition)  be  the  distinguished  node  of 
h  h  Ji  h 

<wlO  •  •  •  010  •  •  •  Olw'  >  and  thus  wlO  •  •  •  010  •  •  •  01  w'  would  be  on  the  same 


row  as  w/0  •  •  •  000  •  •  •  Olw '  ,  a  contradiction.  Each  necklace  contains  at  most 
2r  such  primary  nodes  where  r  is  the  difference  between  the  lengths  of  the  longest 
and  second  longest  block  of  zeros  in  any  string  of  the  necklace.  By  Lemma  2-3,  we 
can  conclude  that  there  are  at  most  0(N/logN)  such  primary  nodes  in  the  entire 
shuffle-exchange  graph.  Thus,  at  most  O (N/logN)  exchange  edges  link  nodes 
which  are  in  different  rows. 


(b)  Exchange  Edges  Which  Link  Nodes  in  the  Same  Row 

Using  the  analysis  developed  in  section  3.2.2b,  it  is  not  difficult  to  show  that  at 
most  0 (N/logN)  horizontal  tracks  are  needed  to  insert  the  exchange  edges  which 
link  two  nodes  that  are  in  the  same  row.  In  particular,  there  are  still  only 
0(  N/logN)  odd  nodes  in  the  top  logk  rows  of  the  grid  and  thus  at  most  Q(N/!ogN) 
exchange  edges  are  contained  in  the  top  logk  rows  These  can  be  trivially  inserted 
with  the  addition  of  just  0(A'/7og/V)  horizontal  tracks. 

Again  following  the  methods  of  section  3.2.2b,  it  is  not  difficult  to  show  that  two 
exchange  edges  overlap  on  the  ith  row  only  if  the  first  /  bits  of  the  associated  nodes 
are  identical.  Thus  at  most  2k~'  tracks  are  needed  to  insert  all  of  the  exchange 
edges  in  the  ith  row  for  all  i>logk.  Summing,  we  can  again  conclude  that  at  most 
0(N/logN )  additional  horizontal  tracks  arc  needed  to  insert  the  remaining 
exchange  edges. 
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3.3.4  Comments 


The  methods  developed  in  this  chapter  can  be  used  to  find  several  other  optimal 
layouts  for  the  shuffle-exchange  graph.  The  key  variant  is  the  method  by  which  a 
node  is  distinguished.  In  particular,  this  method  must  be  impervious  to  small 
alterations  in  the  necklace.  (This  is  so  that  most  exchange  edges  will  link  nodes 
which  are  in  the  same  row  of  the  grid.)  Only  by  changing  the  value  of  a  bit  in  a 
small  segment  of  the  necklace  (such’  as  in  the  primary  or  secondary  block  of  zeros) 
should  we  be  able  to  globally  change  the  distinguished  node. 

Another  method  of  distinguishing  a  node  is  to  select  that  node  in  the  necklace 
which  has  the  minimal  value.  Although  the  proof  is  very  difficult,  it  can  be  shown 
that  the  layout  for  the  A-node  shuffle-exchange  graph  constructed  in  this  manner 
has  at  most  0(N2/log2N)  area.  In  the  following  section  we  will  desribe  additional 
methods  of  distinguishing  nodes. 

At  this  point,  we  should  also  note  that  the  layout  just  described  is  not  known  to 
have  optimal  maximum  edge  length.  In  Part  II  of  the  thesis,  we  show  that  every 
layout  of  the  A-node  shuffle-exchange  graph  must  have  some  edge  of  length  at 
least  tt(N/log2N).  All  the  layouts  we  have  considered  thus  far  contain  wires  of 
length  0(N/logN). 

3.4  Layouts  With  Additional  Edges 

For  some  applications  (such  as  the  calculation  of  the  discrete  Fourier  transform), 
it  is  useful  to  consider  networks  which  have  more  than  just  shuffle  and  exchange 
edges.  In  particular,  we  will  be  interested  in  layouts  for  the  shuffle-exchange  graph 
which  also  include  shift,  reverse  and  transpose  edges.  In  what  follows,  we  will 
show  how  to  modify  the  optimal  layout  for  the  shuffle-exchange  graph  so  that 
these  additional  edges  can  be  inserted  without  increasing  die  total  area  by  more 
than  a  constant  factor. 

3.4.1  Shift  Edges 

Shift  edges  link  the  ith  node  to  the  (/>  l)st  node  for  all  odd  i.  When  combined 
with  the  exchange  edges,  the  resulting  network  will  have  links  between  the  ith  and 
the  (/>  l)st  nodes  for  all  /.  The  inclusion  of  such  edges  facilitates  the  computation 
of  discrete  Fourier  transforms  at  sequential  intervals  of  a  continuous  signal.  In 


such  applications,  the  input  data  contained  in  the  ilh  processor  is  shifted  to  the 
(i+  I)st  processor  for  each  i  after  each  computation  of  a  discrete  Fourier  transform. 
The  graph  consisting  of  shuffle,  exchange  and  shift  edges  is  known  as  the  shuffle- 
shift  graph. 

Using  the  methods  developed  in  section  3.3,  it  is  not  difficult  to  show  that  the 
A’-node  shuffle-exchange  graph  can  be  laid  out  using  only  0(N2/log?N)  area.  As 
before,  the  necklaces  are  broken  into  two  or  three  pieces  and  placed  in  a  grid 
according  to  the  value  of  the  associated  distinguished  node.  Thus  the  shuffle  edges 
can  be  inserted  as  before  using  only  0(N/logN)  vertical  and  horizontal  tracks. 

For  most  odd  nodes,  adding  a  /  to  the  value  of  the  node  changes  only  a 
relatively  small  number  of  bits  at  the  end  of  the  string.  Thus  it  can  be  shown  that 
at  most  O (N/logM)  shift  edges  link  nodes  which  are  in  different  rows.  These  can 
be  easily  inserted  using  only  O (N/logN)  vertical  and  horizontal  tracks.  Of  those 
edges  which  link  nodes  in  the  same  row,  at  most  0(N//ogN)  are  contained  in  the 
first  logk  rows.  For  i>logk,  at  most  2k'1  shift  edges  overlap  at  any  point  of  the  ilh 
row.  By  introducing  an  extra  vettical  track  for  each  necklace  piece,  it  is  possible  to 
separate  the  layout  of  the  shift  edges  on  each  level  from  that  of  the  exchange 
edges.  Thus  both  can  be  inserted  simultaneously  in  the  ilh  row  using  only  0(2k'1) 
total  horizontal  tracks.  By  the  arguments  of  section  3.3,  this  means  that  at  most 
OiN/log.ty  additional  horizontal  tracks  are  needed  to  embed  all  of  the  remaining 
shift  and  exchange  edges,  thus  completing  the  argument 

3.4.2  Reverse  Edges 

Reverse  edges  link  pairs  of  nodes  that  are  associated  with  binary  strings  which 
are  reverses  of  each  other.  For  example,  is  linked  to  a0-  •  via  a 

reverse  edge.  Since  the  algorithm  which  computes  discrete  Fourier  transforms  on 
the  shuffle-exchange  network  leaves  the  output  for  node  fl*-/***^  'n  n°de 
ao’  *  'ak-i  •  rcversc  edges  provide  a  fast  and  convenient  way  of  straightening  out 
the  solution.  The  graph  consisting  of  shuffle,  exchange,  shift  and  reverse  edges  will 
be  referrred  to  as  the  shuffle- shift- reverse  graph. 

Using  the  techniques  developed  in  section  3.3,  it  is  also  possible  to  show  that  the 
A-nodc  slniflle-shift-rc verse  graph  can  be  laid  out  in  0(Av7 og2N)  area.  The  basic 
idea  is  to  modify  the  layout  desciibcd  in  section  3.4.1  so  that 
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1)  pieces  of  necklaces  which  are  reverses  of  each  other  are  paired  together  in 
the  left-to-right  ordering,  and 

2)  pieces  of  necklaces  are  folded  in  half. 

The  first  constraint  insures  that  the  maximal  overlaps  of  the  reverse  edges  in 
each  row  will  be  small  while  the  second  constraint  insures  that  most  reverse  edges 
link  nodes  which  are  in  the  same  row.  Although  it  is  not  immediately  obvious,  it 
can  be  checked  that  these  modifications  do  not  substantially  change  the  procedure 
for  inserting  the  shuffle,  shift  and  exchange  edges  which  was  described  in  section 
3.4.1.  Thus  all  of  the  edges  can  be  inserted  using  at  most  0(N/logN)  vertical  and 
horizontal  tracks. 

3.4.3  Transpose  Edges 

Transpose  edges  link  the  ith  node  to  the  ( N-l-i)th  node  for  each  /.  Viewed  in 
terms  of  binary  strings,  transpose  edges  link  each  node  to  its  complement 
Although  we  do  not  know  of  any  specific  applications  of  transpose  edges,  they 
would  be  useful  for  problems  that  require  frequent  transposition  of  the  data. 

Ey  further  modifying  the  optimal  layout  for  the  shuffle-shift-reverse  graph,  it  is 
possible  to  add  transpose  edges  without  increasing  the  total  area  by  more  than  a 
constant  factor.  In  particular,  the  layout  should  be  modified  so  that 

1)  pieces  of  necklaces  which  are  complements  of  each  other  are  paired  together 
in  the  left-to-right  ordering,  and 

2)  the  distinguished  node  is  selected  on  the  basis  of  the  location  of  the  longest 
block  of  consecutive  identical  bits  (be  they  zeros  or  ones). 

The  first  constraint  insures  that  the  maximal  overlaps  of  the  transpose  edges  in 
each  row  are  small  while  the  second  constraint  insures  that  most  transpose  edges 
link  nodes  which  are  on  the  same  row.  Although  we  do  not  present  the  details 
here,  it  is  possible  to  show  that  such  a  layout  can  be  constructed  using  only 
0(N2/log2N)  area,  the  least  possible. 


Appendix:  Proofs  of  Lemmas  3*1  Through  3-4 


We  now  present  the  proofs  of  Lemmas  3-1  through  3-4.  Such  results  can  also  be 
found  in  the  recent  work  of  Guibas  and  Odlyzko  [G081a,G081b].  We  are  deeply 
indebted  to  Kleitman  for  suggesting  the  proof  of  Theorem  3-1. 

In  what  follows,  we  will  write  to  denote  the  number  of  k- bit  strings 

which  do  not  contain  t-1  consecutive  zeros.  Except  for  the  string  of  all  zeros 
(which  we  ignore),  these  are  precisely  the  strings  which  do  not  contain  the 

t 

substring  v,  =  10  •  •  •  0  .  The  proofs  of  Lemmas  3-1  through  3-4  depend  heavily 
on  the  following  combinatorial  result. 

Theorem  3*1:  For  large  t  and  K 

*k(t)  =  2k  ek2'1  (PW’W21) . 

Proof:  We  first  count  the  number  ^'(r)  of  Ar-bit  strings  which  do  not  contain 
an  occurrence  of  vt  between  the  beginning  and  end  of  the  string  (i.e.,  for  the  time 
being  we  ignore  the  occurrences  of  vt  which  begin  at  the  end  and  end  at  the 
beginning  of  a  string). 

Fix  t  and  let  f  denote  the  number  of  /-bit  strings  ending  with  v,  but  which  do 

1  OO 

not  contain  any  other  occurrences  of  v,  in  the  string.  Set  f\x)  =  .  Note 

that  Vk '  (/)  is  the  (k+ i)th  coefficient  of  F{x).  Let  denote  the  number  of  /-bit 
strings  ending  in  v,  which  contain  precisely  j  occurrences  of  v,  and  set 

/%)  =  E^x*  . 

4-0 

Since  occurrences  of  v,  cannot  overlap,  it  is  not  difficult  to  show  that  F^\x)  is 
identical  to  F{x)  ;  for  all  j  >  1  . 

Let  gj  be  the  number  of  i- bit  strings  which  end  in  v,  (regardless  of  the  number  of 

other  occurrences  of  v,  which  appear  in  the  string)  and  set  G(x)  =  ^  gtx' .  Since 
gJ=2,',  for  all  /  >  /,  it  is  easily  seen  that  G(x)  =  x'/(l-2x)  .  Also  note  that 

ax)  =  e 


=  lmj 

y-1 

=  [1/{1-F{x))]  -  1 

and  thus  that 

/U)  =  G(x)/(G(x)  +  1) 

=  x'/(/  -  2x  +  x 0  . 

Thus  ¥*'(/)  is  simply  the  Ar/^  coefficient  of  1  /  (1  -  2x+x 0  .  For  example, 
^/(2)=5  which  is  the  coefficient  of  x4  in  the  expansion  of  /  /  (/  -  2x+x *) . 

Let  p(x)  =  1  -  2x+xl .  It  is  easily  observed  that  gcd(p(x),  dp(x)/dx)  =  1  and 
thus  that  p(x)  does  not  have  any  multiple  roots  for  t>  2  .  Thus  we  can  expand 

M’1  =  2  A/(x-r ^ 

C»» 

where  {r(  \  1  <  i<t}  is  the  set  of  distinct  (and  possibly  complex)  roots  of  p(x)  and 

=  ( (x-ryp(x) 

=  l/[dp(xydx  \r_ 

for  /</</.  Once  the  roots  of  p(x)  are  known,  we  can  calculate  ^*’0)  from 
the  formula 

lyw  =  • 

Although  we  do  not  know  how  to  find  the  roots  of  p(x)  explicitly  for  large  t ,  we 
can  describe  them  asymptotically.  First  observe  that  as  /-»oo,  the  absolute  value 
of  every  root  must  approach  either  1/2  or  1.  Otherwise  the  absolute  value  of  one 
term  of  p(x)  will  dominate  the  sum  of  the  absolute  values  of  the  other  two  terms. 
For  example,  if  H  <  c  <  1/2  as  i-*oo  for  some  root  r  and  constant  c,  then 
1  >  12/1-fl^  for  large  t. 

If  there  are  to  be  any  roots  r  such  that  \r\-*l/2,  it  is  essential  that  r—1/2. 
Otherwise,  the  real  part  of  p{r)  cannot  vanish  for  large  /.  By  substituting 
(1/2)^  for  r  where  5 as  r-*oo,  we  find  that 
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/  -  e*')  +  ?'  ^0 


0 


and  thus  that 


1  -  (1  +  *i)  +  0 (j(/)*’))  +  ?l(l  +  0(/5(/)))  =  0 

Thus  5(0  =  ?l+q(t)  where  \q(t)\  «  ?l  as  f-*o o.  Another  iteration  of  this 
process  reveals  that  q(i)-0(i?2 0  and  thus  that 

r  =  {1/2)  e2  cP(t2  )  as  t-*  co  . 

In  fact,  there  is  precisely  one  root,  say  rt  ,  which  approaches  1/2  as  /-* oo. 
The  absolute  values  of  the  remaining  roots  approach  1.  In  particular,  the  absolute 
values  of  these  roots  must  be  greater  than  or  equal  to  /  for  large  /.  Otherwise  there 
would  be  a  root  r  and  a  function  t(t)-*0+  such  that  |r|  =  /-e(r)  .  But  then 

1 2r\  =  2  -  2e(t) 

>  /  +  |/  -  e(or 
=  /  +  M 

for  02  and  it  would  be  impossible  for  p(r)  to  vanish  for  large  /,  a  contradiction. 

It  remains  to  compute  the  Ai  .  Since  dp{x)/dx  =  ix1'1  -  2  ,  we  find  that 
A/  =  -(1/2)+ Oil?*)  and  that  .  A,  =  0(1 /i)  for  2<i<t  .  Thus 

*k ’  (0  =  0(/)  -  [-1/2  +  0(1?*)]  2k+ 1  e<k+  W  (PW2')  . 

Replacing  l  +  0(t?*)  with  t P W  )  and  simplifying,  we  conclude  that 

*k  •  (f)  =  2k  ek2‘  (P(t2  '-  kl2^ 


for  large  /  and  k. 

The  only  strings  which  are  included  in  the  count  of  *k’(i)  but  not  in  that  of 

i  i- 1 

^k(t)  are  those  of  the  form  0 •  • 0»v/0  •  •  0  where  1  <  i  <  t-1  and  w  is  a  string 

which  is  included  in  the  count  of  SF*./(0  .  Thus 


_  2k  ek2  ‘  P(l2'’ kl2^  -  (t .  i)  2^  e'lW  e°(‘2' '•kt2'2') 

_  2k  ek2  '  <p(‘2  '-  kt2  ^ 

for  large  /  and  k.  This  completes  the  proof  of  the  theorem  □ 

We  can  now  prove  Lemmas  3-1  and  3-2. 

Proof  of  Lemma  3*1:  From  the  definition,  we  know  that 

Vf/0  =  *kU+2)  -  Tk(t+1) 

_  2k  ek2°+2>  e0^2  '- kt2  ^  -  2kek2  (‘+l)  ^t2  ‘’kt2  ^ 

for  large  t  and  k.  For  t  >  (logk)/2 + loglogk  ,  both  t?1  and  kiT2t  vanish  as 
k-* oo.  In  what  follows,  we  will  show  that  if  t  «  k  ,  then 

ek2<,+2)  -  ek2<‘+t)  »  OUT1,  ktl2t) 

and  thus  that 

*k(t)  ~  2k{e'k2°+2>  -  e'k2°+1))  . 

Assume  for  the  purposes  of  contradiction  that 

ek2°+2)  -  ek2'(t+,>  <  OUT1,  ktl2t)  . 

Then,  eKJ  ~  e K*  which  means  that  eKZ  ~  1  and 

thus  that  k2'(<l+7>  -*  0  .  Thus  we  can  use  a  Taylor  series  expansion  of  the 
exponentials  to  find  that 

-  e-k2°+,>  ~  (l-k2«+2>)  -  0  -k2<t+I>) 

=  k2’it+2) 

»  OUT1,  ki72t) 

provided  that  t  «  k  ,  a  contradiction  □ 

Proof  of  Lemma  3*2:  The  number  of  Ar-bit  strings  which  do  not  contain  a  block 
of  logk  •  loglnk  -  /  consecutive  zeros  is 
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—  .  ,  ,-iotk + loglnk 

'Vk{logk  -  loglnk)  ~  2* 

=  2*/* 

=  0(N/logN)  . 

The  number  of  fc-bit  strings  which  contain  a  block  of  2logk  + 1  consecutive  zeros 
is 

2k  -  H'k(2logk+2)  ~  2k  -  2k  ek2'2l°tk’2  Pi^gkyk2) 

-  2k  -  2k[l  -  W*)  +  0 dlogkyk2)) 

~  2*/4* 

=  0(N/logN)  □ 

The  proofs  of  Lemmas  3*3  and  3-4  depend  on  the  following  corollary  to 
Theorem  3-1. 

Corollary  3-1:  For  bounded  m  and  p  and  large  k  and  t, 

=  ouk/n . 

Proof:  •  We  first  observe  that  for  t  <  2logk/3  , 

**-«/+/>(')  <  *kVlogk/3) 

~  2k  ****** 

=  2k  ek‘/3 

and  thus  that 

2  +*.„,+/()  <  (2/i) logk 2* 

«  2*/*w 

for  any  finite  m  and  />  as  k-*oo  . 

For  larger  values  of  /, 
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and  thus 


-mt+p 

k+z  *££. 

2+t.„„+p(o  -  l  . 

^  t'  li*SS 

T  5 

By  making  the  change  of  variables  r  =  /  -  /ogfc  ,  we  can  see  that  the  preceding 
sum  is  at  most 


C 2k+P/km )  2  Tmre2  ' 


rs.co 


and  thus  at  most  0 (2k/km)  =  0(A VlogN)  □ 


Proof  of  Lemma  3*3:  A  string  whose  longest  block  of  zeros  has  length  t  and 

iH 

whose  second  longest  block  of  zeros  has  length  s<t  is  of  the  form  wlO-  •  .0w\ 
where  the  longest  block  of  zeros  in  ww'  has  length  s.  By  definition,  there  are  at 
most  Mk.h](s)  such  strings.  Thus  the  sum  over  all  necklaces  of  the  difference 
between  the  sizes  of  the  longest  block  and  second  longest  block  of  zeros  is  at  most 

s  <//*>  («>***., .,(s) 

e*©  s  i«> 

=  22 (w)  Wk-,.fc+2)  -  +*.,.,(*+/)] 

t*e  S«o 
K  K  _ 

=  22  V) 

S;» 

=  £  ( *  «•«■’  fiv?-  i  t '  <>? ) 

t«S 

<  2(2*  ek2'S  e°(s2'S-  ks2^  Ts  <P(s2'S ) ) 

S  *1 

=  2  2k's ek2' <P(s2'- ks2^ 

S.*i 

=  0(N/logN) 


by  Corollary  3-1  □ 


Proof  of  Lemma  3*4:  Consider  a  necklace  which  fails  to  have  a  uniquely 
distinguished  node.  Each  node  in  such  a  necklace  must  have  one  of  the  following 
three  forms: 


*iz 

1)  wfi. •  •  •  *  •  Qw? , 

t  t 

*■ 

2)  i VfQ‘  *  •  *  •  •OjWj  or 

r  ^V* 

where  /  is  the  length  of  the  longest  block  of  zeros  in  any  of  the  strings.  It  is  easily 
seen  that  there  are  at  most 

*/z  __ 

1)  k  2  'k  k-2i(l+ $  nodes  of  the  first  type, 

tw 

KIS  _ 

2)  £2  2  '*k-3i(l+2)  nodes  of  the  second  type  and 

ki  i  __ 

3)  k3^^k.4jit+2)  nodes  of  the  third  type. 

t-> 

By  Corollary  3.-1,  we  can  thus"  conclude  that  there  are  at  most  0(N/logN)  such 
nodes  altogether  □ 
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CHAPTER  4 
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PRACTICAL  LAYOUTS 


Although  the  0(A^//og^V)-area  layout  for  the  shuffle-exchange  graph  described 
in  Chapter  3  is  (up  to  a  constant)  asymptotically  optimal,  it  is  not  optimal  for  small 
values  of  N  (e.g.,  A=725).  In  fact,  none  of  the  general  layout  procedures  thus  far 
discussed  provide  good  layouts  for  small  shuffle-exchange  graphs.  For  practical 
applications,  however,  these  are  precisely  the  shuffle-exchange  graphs  for  which  we 
need  good  layouts. 

In  this  chapter,  we  descibe  techniques  for  finding  good  layouts  for  small  shuffle- 
exchange  graphs.  Although  the  techniques  (which  are  described  in  section  4.2)  do 
not  yet  constitute  a  general  procedure  for  finding  truly  optimal  layouts  for  all 
shuffle-exchange  graphs,  they  can  be  used  to  find  "very  nice"  layouts  for  "small" 
shuffle-exchange  graphs.  As  examples,  we  have  included  layouts  for  the  5-node, 
76-node,  52-node,  64-node  and  725-node  shuffle-exchange  graphs  in  section  4.3. 
The  layouts  are  "very  nice"  in  the  sense  that: 

1)  they  require  much  less  area  than  previously  discovered  layouts, 

2)  they  have  a  certain  natural  structure  which  facilitates  efficient  layout 
description,  chip  manufacture  and  I/O  management,  and 

3)  they  require  the  minimal  amount  of  area  for  layouts  with  such  structure. 

4.1  Preliminaries 

We  have  chosen  to  use  the  Thompson  grid  model  [T80]  to  illustrate  our 
techniques  because  of  its  widespread  acceptance  and  its  simplicity.  For  practical 
layouts,  however,  the  assumption  that  processors  can  be  represented  by  points  is 
clearly  false.  Nontheless,  we  show  in  section  4.1.1  that  good  Thompson  model 
layouts  can  still  be  used  to  find  good  practical  layouts.  Thus  we  will  be  able  to  rest 
assured  that  the  Thompson  model  is,  in  fact,  an  acceptable  means  for  describing 
practical  layouts  of  the  shuffle-exchange  graph. 
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We  must  also  be  sure  that  the  layouts  we  design  can  be  effectively  used  in 
practice.  For  example,  it  is  important  that  the  layouts  have  a  suitable  input/output 
structure  so  that  data  can  be  put  on  and  taken  off  the  chip  efficiently.  In  section 
4.1.2,  we  describe  a  general  class  of  layouts  for  the  shu file-exchange  graph  which 
appear  to  satisfy  such  constraints.  The  remainder  of  the  chapter  will  then  be 
devoted  to  finding  optimal  layouts  within  this  class. 

4.1.1  A  Closer  Look  at  the  Thompson  Model 

The  manner  in  which  the  Thompson  model  is  useful  for  describing  practical 
layouts  varies  with  the  size  of  the  processors  involved.  For  example,  if  one  desires 
to  use  the  shuffle-exchange  graph  as  a  permuter,  then  each  processor  need  only 
contain  k  storage  registers  and  some  I/O  hardware.  Such  a  processor  can  be  easily 
hardwired  in  a  kxk  square.  In  order  to  achieve  maximum  parallelism,  each  wire  of 
the  Thompson  model  layout  is  reproduced  k  times  so  that  an  entire  k- bit  word  can 
be  transmitted  in  one  time  step.  For  example,  the  optimal  2x6  Thompson  model 
layout  for  the  £-node  shu  file-exchange  graph  (which  is  shown  in  Figure  4-3  in 
section  4.3)  can  be  transformed  into  the  more  realistic  6x18  layout  shown  in  Figure 
4-1  by  tripling  the  grid  lines  and  replacing  the  point  processors  by  3x3  boxes  (into 
which  the  guts  of  each  processor  can  later  be  wired). 


Figure  4*1:  A  transformed  Thompson  model  layout 
for  the  8-node  shuffle-exchange  graph. 

For  some  applications,  the  processors  themselves  require  an  entire  chip.  For 
example,  every  processor  of  a  shuffle-exchange  graph  used  to  compute  discrete 
Fourier  transforms  must  be  equipped  with  a  floating  point  multiplier.  Using  the 
best  technology  currently  available,  only  a  few  floating  point  multipliers  can  be 
wired  onto  a  single  chip.  In  this  case,  a  Thompson  model  layout  can  be  used  to 


design  an  efficient  layout  of  chips  where  each  chip  contains  a  single  processor. 
(Such  a  device  is  currently  under  development  at  IBM.)  The  wires,  as  before,  are 
replicated  to  achieve  maximum  parallelism  but  now  serve  as  links  between  chips. 
Since  the  wires  must  be  much  wider  in  such  a  device,  the  side  length  of  a  processor 
(the  chip)  is  about  the  same  as  the  combined  width  of  all  the  wires  (pins)  attached 
to  it.  By  following  an  expansion  procedure  similar  to  the  one  described  in  the 
previous  example,  a  good  Thompson  model  layout  can  thus  be  used  to  design  a 
good  practical  layout 

4.1.2  A  Class  of  Practical  Layouts 

In  this  chapter,  we  will  consider  layouts  for  the  shuffle-exchange  graph  for 
which: 

1)  each  necklace  appears  as  a  rectangle  consisiting  of  arbitrarily  long  segments 
of  two  vertical  tracks  and  unit  length  segments  of  two  horizontal  tracks, 

2)  the  horizontal  tracks  are  divided  into  pairs,  each  pair  containing  at  most  one 
full  necklace  and  any  number  of  degenerate  necklaces,  and 

3)  each  exchange  edge  appears  as  a  horizontal  line  segment 
For  example,  the  layouts  described  in  Chapter  2  have  this  form. 

Such  layouts  are  particularly  well  suited  for  practical  implementation  since  their 
structure  facilitates  efficient  description,  chip  manufacture  and  data  management 
For  example,  by  attaching  a  pin  to  each  of  the  Q(N/logN)  necklaces  (this  is 
feasible  for  small  AO,  it  is  possible  to  load  N  input  values  into  an  Af-processor 
shuffle-exchange  chip  in  just  O(logN)  steps. 

Even  more  importantly,  we  will  show  in  the  following  section  how  to  find 
layouts  with  the  above  form  which  require  very  small  amounts  of  area.  Thus  very 
little  is  lost  by  restricting  our  attention  to  such  layouts. 

4.2  Optimization  Techniques 

In  this  section,  we  explain  how  to  find  layouts  for  small  shuffle-exchange  graphs 
which  are  optimal  up  to  the  constraints  described  in  section  4.1.2.  For  the  most 
part  our  methods  are  comprised  of  common  sense,  heuristics  and  exhaustive 
searches. 


4.2.1  Ordering  the  Necklaces 


The  first  step  in  finding  optimal  layouts  of  the  form  described  in  section  4.1.2  is 
to  order  the  necklaces  from  left  to  right  so  that  the  number  of  exchange  edges 
which  overlap  at  each  point  of  the  ordering  is  kept  small.  More  precisely,  we  wish 
to  find  an  ordering  of  the  necklaces  for  which  the  maximum  number  of  exchange 
edges  overlapping  at  any  point  is  minimized.  For  example,  no  more  than  6 
exchange  edges  overlap  at  any  point  of  the  ordering  used  to  produce  the  layout  for 
the  32-node  shuffle-exchange  graph  shown  in  Figure  4-2.  If  we  switched  the 
necklace  <J>  with  <U>,  however,  9  exchange  edges  would  overlap  in  the  gap 
between  <7>  and  <5>.  Since  the  maximum  overlap  is  a  lower  bound  on  the 
number  of  horizontal  tracks  necessary  to  insert  the  exchange  edges,  we  can  easily 
see  that  the  latter  ordering  is  inferior  since  any  layout  it  produces  must  have  at 
least  9  horizontal  tracks.  Note  that  the  layout  in  Figure  4-2  has  just  6  horizontal 
tracks. 


<0>  <1>  <3>  <5>  <7>  <11>  <15><31> 


Figure  4-2:  A  good  ordering  of  the  necklaces 
for  the  32-node  shuffle- exchange  graph. 


As  we  mentioned  in  Chapter  3,  it  is  not  known  how  best  to  order  the  necklaces 
in  general.  For  small  shuffle-exchange  graphs,  however,  there  are  several  simple 
heuristics  which  produce  optimal  orderings.  For  example,  arrangements  of  the 
necklaces  from  left  to  right  in  order  of  nondecreasing  size  or,  alternatively,  in  order 
of  increasing  minimal  number  represented  are  usually  quite  close  to  optimal  for 
small  shuffle-exchange  graphs.  In  fact,  such  orderings  are  within  a  necklace  swap 
of  optimal  for  N<256  ( k<S ).  Note  the  the  ordering  displayed  in  Figure  4-2  could 
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have  been  produced  by  either  of  these  methods. 


Probably  the  most  difficult  task  is  proving  that  a  good  ordering  is,  in  fact, 
optimal.  The  techniques  we  have  used  to  prove  optimality  depend  heavily  on 
exhaustive  searches.  For  k<8,  the  techniques  have  suceeded  in  proving  the 
optimality  of  good  orderings.  For  9<k</3,  we  have  found  good  orderings  but 
have  been  unable  to  prove  that  they  are  optimal.  We  have  summarized  the  results 
in  Table  4-1.  Note  that  for  each  k,  the  maximum  overlap  of  the  best  known 
ordering  serves  only  as  a  lower  bound  for  the  number  of  horizontal  tracks  that  will 
be  required  for  any  layout  with  that  ordering.  In  some  cases,  additional  horizontal 
tracks  may  be  required. 


Table  4-1 


Maximum  Overlap  of  Best  Known  Orderings 


k 

N 

maximum  overlap  of 
best  known  ordering 

optimal? 

3 

8 

2 

yes 

4 

16 

3 

yes 

5 

32 

6 

yes 

6 

64 

10 

yes 

7 

128 

18 

yes 

8 

256 

33 

yes 

9 

512 

62 

7 

10 

1024 

115 

? 

11 

2048 

214 

? 

12 

4096 

388 

? 

13 

8192 

754 

7 

4.2.2  Inserting  the  Exchange  Edges 

The  second  step  in  constructing  optimal  layouts  for  small  shuffle-exchange 
graphs  is  to  insert  the  exchange  edges  using  as  few  horizontal  tracks  as  possible. 


Recall  that  in  Chapter  2,  we  showed  how  to  use  the  complex  plane  diagram  as  one 
method  of  inserting  the  exchange  edges.  Although  this  method  is  theoretically 
nice,  it  is  not  very  practical  since  it  uses  an  excessive  number  of  horizontal  tracks  to 
insert  the  exchange  edges.  For  example,  10  horizontal  tracks  were  used  to  insert 
the  exchange  edges  in  the  layout  shown  in  Figure  2-3  whereas  only  6  tracks  were 
required  in  the  layout  shown  in  Figure  4-2  (even  though  the  same  necklace 
orderings  were  used  for  both  layouts). 

The  complex  plane  diagram  can  still  be  of  use  when  inserting  exchange  edges, 
however.  For  example,  notice  that  the  top-to-bottom  orderings  of  the  exchange 
edges  across  most  of  the  vertical  cuts  which  are  located  between  necklaces  in  the 
layout  in  Figure  4-2  are  the  same  as  the  orderings  for  the  corresponding  cuts  in 
Figure  2-3.  In  general,  knowledge  of  the  level  structure  of  the  complex  plane 
diagram  is  very  helpful  in  optimizing  the  insertion  of  the  exchange  edges.  In  fact, 
we  relied  heavily  on  such  knowledge  when  constructing  the  optimal  layouts 
displayed  in  section  4.3. 

For  very  small  shuffle- ex  change  graphs  (e.g.,  for  k<5),  it  is  possible  to  find 
optimal  embeddings  of  the  exchange  edges  by  trying  all  reasonable  possibilities. 
For  somewhat  larger  shuffle-exchange  graphs  (e.g.,  k  =  6,7),  however,  the  task  is 
substantially  more  difficult  In  order  to  find  the  optimal  layouts  shown  in  section 
4.3,  we 

1)  first  located  the  center  of  the  region  of  maximum  overlap  and  (using  the 
complex  plane  diagram  as  a  guide)  inserted  the  exchange  edges  which 
crossed  the  region  (one  edge  on  each  horizontal  track), 

2)  next  inserted  the  exchange  edges  located  in  neighboring  regions  without  (if 
possible)  introducing  any  additional  tracks,  and 

3)  lastly  inserted  the  remaining  exchange  edges  (again  without  adding  any  new 
horizontal  tracks). 

Steps  1  and  3  are  easy  but  step  2  can  be  difficult.  In  some  cases  it  is  necessary  to 
interchange  the  left  and  right  parts  of  some  necklaces  or  to  slide  a  node  around 
from  one  part  of  a  necklace  to  the  other.  For  k  =  6  and  7,  it  is  also  necessary  to 
introduce  an  extra  horizontal  track  at  step  2.  For  larger  shuffle-exchange  graphs,  it 
would  probably  be  necessary  to  introduce  even  larger  numbers  of  horizontal  tracks. 


4.2.3  Additional  Savings 

All  of  the  practical  layouts  we  have  considered  thus  far  have  two  horizontal 
tracks  which  are  used  solely  for  the  purpose  of  connecting  the  left  part  of  each 
necklace  to  the  right  part.  It  is  not  difficult  to  show  that  these  tracks  can  be 
eliminated  without  affecting  the  rest  of  the  layout.  As  an  example  of  how  this  can 
be  accomplished,  we  suggest  that  the  reader  compare  the  layout  of  the  32-node 
shuffle-exchange  graph  shown  in  Figure  4-2  with  that  in  Figure  4-5. 

Even  larger  savings  can  be  had  for  some  shuffle-exchange  graphs  by  doubling 
up  the  degenerate  necklaces  with  full  necklaces  in  the  same  pair  of  vertical  tracks, 
thus  reducing  the  number  of  vertical  tracks  used.  Of  course,  it  is  necessary  to 
rearrange  the  exchange  edges  somewhat  but,  as  degenerate  necklaces  have  very  few 
nodes  in  small  shuffle-exchange  graphs,  this  can  usually  be  done  without 
introducing  any  additional  horizontal  tracks.  For  example,  substantial  savings  can 
be  achieved  in  this  manner  for  the  /6-node  and  64-node  shuffle-exchange  graphs. 

4.3  Optimal  Layouts 

In  the  following  figures,  we  exhibit  layouts  for  the  S-node,  /6-node,  32-node,  64- 
node  and  /23-node  shuffle-exchange  graphs  which  are  optimal  up  to  the 
constraints  described  in  section  4.1.2.  The  layouts  were  found  via  the  techniques 
described  in  section  4.2. 


Figure  4-3:  A  2x6  layout  for  the  8-nodc  shuffle-exchange  graph. 


Figure  4-6:  An  1 1x18  layout  for  the  64- node  shuffle- exchange  graph . 


4.4  Other  Layouts 

To  this  point,  we  have  considered  only  a  specific  class  of  layouts  for  the  shuffle- 
exchange  graph.  As  these  layouts  are  quite  good,  it  is  not  clear  that  we  need  to 
consider  others.  Nevertheless,  it  is  worth  pointing  out  that  slightly  better  layouts 
do  exist  for  some  shuffle-exchange  graphs.  For  example,  by  considering  layouts  in 
which  the  exchange  edges  are  allowed  to  bend  and  in  which  two  or  more  full 
necklaces  can  occupy  the  same  pair  of  vertical  tracks,  it  is  possible  to  construct  the 
layout  for  the  52-node  shuffle-exchange  graph  shown  in  Figure  4-8. 


Figure  4-8:  An  improved  7x9  layout  for  the  32-node  shuffle- exchange  graph. 

It  is  likely  that  slight  improvements  can  also  be  made  for  larger  shuffle-exchange 
graphs.  At  this  point,  however,  we  feel  that  research  efforts  should  be  directed 
more  towards  implementation  of  the  good  layouts  already  discovered.  Once  this  is 
done,  it  will  be  much  clearer  whether  or  not  the  effort  necessary  to  further  reduce 
the  layout  area  is  justified. 
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PART  II 


LOWER  BOUND  TECHNIQUES  FOR  VLSI 


CHAPTER  5 


REVIEW  OF  KNOWN  TECHNIQUES 


In  this  chapter,  we  review  the  known  techniques  for  determining  the  layout  area 
and  maximum  edge  length  of  an  arbitrary  VLSI  network.  We  also  preview  the 
results  we  will  prove  in  Chapters  6  through  8  of  the  thesis.  A  comparison  of  our 
lower  bounds  with  the  previously  known  upper  and  lower  bounds  can  be  found  in 
Tables  5-2  and  5-4. 

5.1  Area  Bounds 

One  of  the  most  important  problems  in  the  theory  of  VLSI  is  the  determination 
of  the  minimum  amount  of  area  required  to  lay  out  a  network  on  a  chip.  Given  an 
arbitrary  graph,  this  problem  has  two  parts;  namely, 

1)  finding  a  good  layout  for  the  graph,  and 

2)  showing  that  the  layout  is  optimal. 

There  are  a  variety  of  techniques  known  for  finding  good  layouts  for  specific 
graphs  [MR79,  PV79,  S79,  HL80,  MC80,  PV80,  SR80b,  T80,  BL81,  KLLM81, 
LLM81,  LM81,  PRS81,  T81],  but  the  only  known  general  technique  is  due  to 
Leiserson  [L80a,L80bj.  In  particular,  he  showed  how  to  construct  a  good  layout  for 
any  graph  for  which  a  good  separator  is  known.  (An  A-node  graph  is  said  to  have 
an  J^N)- separator  if  it  can  be  partitioned  into  two  equal-sized  subgraphs  Gj  and  G2 
such  that  at  most /A)  edges  link  Gj  to  G2  and  both  Gj  and  G2  have  /A/2)- 
separators.)  We  have  summarized  Leiserson’s  results  in  Table  5-1. 

There  are  two  difficulties  with  Leiserson’s  method.  First,  it  is  not  always 
possible  to  find  a  good  separator  for  a  graph.  For  instance,  a  minimal  0(A VlogNy 
separator  was  not  found  for  the  shu file-exchange  graph  until  after  an  optimal 
0(AV/og^A)-arca  layout  was  discovered.  Secondly,  the  layouts  produced  by 
Leiserson’s  technique  are  not  always  optimal  -  even  if  a  minimal  separator  is 
known.  For  example,  Leiserson’s  technique  requires  0(A/o£- A)  area  to  lay  out  the 
A- node  mesh,  substantially  more  than  is  really  needed.  For  the  most  part 


Table  5*1 


Upper  Bounds  on  the  Layout  Area  of 
N-Node  Graphs  With  Specified  Separators 

upper  bound 

separator  on  layout  area 


Na,  a  <1/2  N 

Na,  a  =  1/2  Nlog?N 

Na,  a  >  1/2  N2a 


however,  Leiserson’s  method  is  a  good  one  and  certainly  the  most  general 
technique  currently  available. 

Once  a  good  layout  for  a  network  has  been  found,  it  remains  to  show  that  the 
layout  is  optimal.  This  is  accomplished  by  proving  a  good  lower  bound  on  the 
layout  area  of  the  network.  The  only  known  methods  for  proving  such  lower 
bounds  are  due  to  Thompson  [T79,T80],  Vuillemin  [V80]  and  Lipton  and 
Sedgewick  [LS81].  They  have  concentrated  on  the  related  problem  of  proving 
lower  bounds  for  the  bisection  width  of  a  graph.  (The  bisection  width  of  a  graph  is 
the  minimum  number  of  edges  which  must  be  removed  in  order  to  separate  the 
graph  into  two  disjoint  and  equal-sized  subgraphs.) 

Thompson  was  the  first  to  notice  the  relationship  between  bisection  width  and 
layout  area.  In  particular,  he  showed  that  the  wire  area  of  a  graph  with  bisection 
width  b  is  at  least  ^(b2).  In  what  follows,  we  prove  the  slightly  weaker  (and 
simpler)  result  for  layout  area. 

Theorem  5-1  (Thompson  {T79]):  The  layout  area  of  a  graph  with  bisection  width 
b  is  at  least  Q (b2). 

Proof:  Consider  an  optimal  layout  of  a  graph  G  with  bisection  width  b.  Cut  the 
layout  horizontally  so  that  precisely  1/2  of  the  nodes  of  G  are  above  the  cut  (For 
an  example,  see  the  diagram  in  Figure  5-1).  Since  at  least  b  edges  must  cross  the 
cut,  the  layout  must  contain  at  least  b-1  vertical  tracks.  A  similar  argument 
reveals  that  the  layout  must  also  have  at  least  b-1  horizontal  tracks.  Thus  the  area 
of  the  layout  is  at  least  (b-1)2  =  Q(b-)  □ 
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Figure  5*1:  A  horizontal  bisection  of  a  layout. 


Although  the  task  of  finding  a  good  lower  bound  on  the  bisection  width  of  a 
graph  is  difficult  in  general,  Thompson  [T79]  was  succcsful  in  finding  good 
bisection  width  lower  bounds  for  a  variety  of  computationally  useful  networks. 
For  example,  he  used  information  transfer  arguments  to  show  that  any  network 
which  is  capable  of  computing  the  discrete  Fourier  transform  on  N  elements  in  T 
steps  must  have  bisection  width  at  least  Q(N/T).  Among  other  things,  he  was  thus 
able  to  conclude  that  at  least  Q^/lo^N)  area  is  required  to  lay  out  the  A- node 
shuffle-exchange  graph. 

Thompson’s  work  has  recently  been  extended;  first  by  Vuillemin  [V80]  and  then 
by  Lipton  and  Sedgewick  [LS81].  Vuillemin  characterized  a  broad  class  of  graphs 
for  which  Thompson’s  lower  bound  arguments  can  be  applied  while  Lipton  and 
Sedgewick  showed  how  to  use  crossing  sequence  arguments  to  prove  lower  bounds 
for  an  even  larger  class  of  graphs. 

Although  the  methods  of  Thompson,  Vuillemin,  Lipton  and  Sedgewick  are  quite 
elegant  and  useful  in  establishing  good  bisection  width  lower  bounds  for  certain 
graphs,  their  applicability  is  inherently  limited  to  graphs  for  which  the  layout  area 
is  no  more  than  a  constant  times  as  large  as  the  square  of  the  bisection  width. 
Thus  they  have  not  been  of  use  in  resolving  two  of  the  key  open  questions  in  VLSI 
theory;  namely, 

1)  "How  much  area  is  needed  to  lay  out  a  planar  graph?"  and 

2)  "How  much  area  is  needed  to  lay  out  a  graph  which  has  an  O (N,/2)- 
separator?." 


The  planar  graph  question  is  particularly  important  since,  as  we  will  show  in 
Chapter  7,  the  layout  problem  of  an  arbitrary  graph  can  be  reduced  to  that  for  a 
planar  graph.  No  nontrivial  lower  bounds  have  been  found  for  either  problem, 
however.  As  we  mentioned  previously,  the  best  procedure  known  requires 
0(N!og2N)  area  to  lay  out  an  arbitrary  A;-node  graph  with  an  0(A/7^)-separator. 
As  Lipton  and  Tarjan  [LT77]  have  shown  that  every  A- node  planar  graph  has  an 
(XA^-separator,  the  O(A/o£-A0-area  layout  procedure  also  works  for  planar 
graphs.  Although  it  is  suspected  that  better  layout  procedures  exist  for  planar 
graphs,  none  have  yet  been  found. 

In  the  thesis,  we  pursue  an  entirely  different  strategy  in  developing  new  lower 
bound  techniques  for  VLSI.  Whereas  previous  researchers  have  been  concerned 
primarily  with  the  bisection  width  of  a  network,  we  shall  be  concerned  with  its 
crossing  number  and  wire  area.  Both  are  lower  bounds  on  the  layout  area  of  any 
graph.  In  fact,  we  will  show'  in  Chapter  7  that 

$2(b*)  <  c+N  <  w  <  A 

for  any  A- node  graph  with  bisection  width  b,  crossing  number  c,  wire  area  w  and 
layout  area  A. 

The  preceding  inequality  implies  that  every  lower  bound  technique  for  the 
bisection  width  of  a  graph  is  also  a  lower  bound  technique  for  its  crossing  number 
and  wire  area.  Thus  nothing  is  lost  by  forgetting  about  bisection  width  and 
concentrating  ones  efforts  on  finding  good  lower  bounds  for  the  crossing  number 
and  wire  area  of  a  graph.  In  fact,  much  can  be  gained.  For  example,  we  will  use 
such  techniques  to  find 

1)  an  A-node  planar  graph  which  has  layout  area  Q(NlogN),  and 

2)  an  A-node  (nonpianar)  graph  with  an  0(A//i?)-scparator  which  has  layout 
area  Q(Nlog*N). 

The  first  result  demonstrates  that  not  all  planar  graphs  can  be  laid  out  in  linear 
area,  thus  disproving  a  conjecture  thought  by  many  to  be  true.  The  second  result 
indicates  that  Leiserson’s  0(A/o£-7A)-area  layout  technique  for  graphs  with 
0(A//-’)-separators  is  optimal  at  least  some  of  the  time  and  thus  cannot,  in  general, 
be  improved. 

For  easy  reference,  we  have  summarized  our  results  along  with  the  previously 
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known  upper  and  lower  bounds  in  the  following  table.  The  upper  bounds  are  due 
to  Leiserson  [L80a]  and  represent  the  maximal  amount  of  area  needed  to  lay  out 
any  graph  with  the  designated  property.  The  lower  bounds,  on  the  other  hand, 
represent  the  minimal  amount  of  area  required  to  lay  out  a  specific  class  of  graphs 
with  the  designated  property.  The  previously  known  lower  bounds  are,  for  the 
most  part,  trivial.  The  only  exception  is  the  N2a  bound  which,  as  a  corollary  of 
Theorem  5*1,  is  due  to  Thompson  [T79]. 


Table  5-2 
Area  Bounds 


separator 

previous 
lower  bound 

our 

lower  bound 

upper 

bound 

Na,  a  <  1/2 

N 

N 

Na,  a  =  1/2 

N 

Nlog2N 

Nlog2N 

Na,  a  >  1/2 

N2a 

N2a 

(planar) 

N 

NlogN 

Nlo^N 

5.2  Edge  Length  Bounds 

There  has  been  a  great  deal  of  interest  lately  in  the  problem  of  minimizing  the 
length  of  the  longest  wire  in  VLSI  layouts  [BL81,CM81,PRS81].  It  is  not  difficult 
to  show  that  the  length  of  the  longest  wire  in  any  reasonable,  area-optimal  VLSI 
layout  is  at  most  a  constant  times  the  square  root  of  the  layout  area.  (Otherwise, 
some  wire  would  be  longer  than  the  perimeter  of  the  layout,  which  is 
unreasonable.)  Bhatt  and  Leiserson  [BL81]  recently  found  better  layouts  for  graphs 
with  small  separators.  We  have  summarized  their  results  in  Table  5-3.  (For 
completeness,  we  have  also  included  the  trivial  bound  for  graphs  with  large 
separators.) 

It  is  worth  noting  that  the  layouts  which  achieve  the  bounds  in  Table  5-3 
simultaneously  achieve  the  best  known  bounds  for  layout  area.  Thus  no  layout 
arca/maximum  edge  length  tradeoffs  are  apparent. 


Table  5-3 


Upper  Bounds  on  the  Maximum  Edge  Length  of 
N-Node  Graphs  With  Specified  Separators 

upper  bound  on 

separator  maximum  edge  length 

Na,  a  <1/2  N1/2/logN 

Na,  a  =  1/2  Nl/2logN/loglogN 

Na,  a  >  1/2  Na 


Very  little  has  been  accomplished  in  the  way  of  lower  bounds,  however,  since 
bisection  width  arguments  do  not  seem  to  be  applicable  to  edge  length 
considerations.  In  fact,  the  only  known  lower  bound  for  maximum  edge  length  is 
the  trivial  lower  bound  derived  from  the  diameter  of  a  graph.  (The  diameter  of  a 
graph  is  the  greatest  distance  between  any  pair  of  nodes  in  the  graph  where 
distance  is  defined  to  be  the  length  of  the  shortest  path  linking  the  pair  of  nodes.) 
The  precise  lower  bound  is  stated  in  the  following  theorem. 

Theorem  5*2:  Any  layout  of  a  graph  G  with  diameter  d  and  layout  area  A  has 
some  edge  of  length  at  least  AI/2/3d. 

Proof:  Let  T  be  any  layout  of  G  and  q  be  the  length  of  the  longest  wire  in  T. 
We  will  use  T  to  construct  another  layout  P  of  C  which  has  at  most  9d2q2  area. 
Since  any  layout  for  G  has  at  least  A  area,  this  will  be  sufficient  to  show  that 
q  >  AI/2/3d. 

Since  every  pair  of  nodes  in  G  is  linked  by  a  path  of  length  d  or  less,  we  can 
conclude  that  every'  pair  of  nodes  are  within  distance  dq  of  each  other  in  T. 
(Otherwise,  some  edge  would  have  length  greater  than  q  in  T,  a  contradiction.) 
Thus,  all  of  the  nodes  are  contained  in  some  dq  x  dq  square  in  T.  Since  every 
wire  which  leaves  the  square  must  re-enter  at  some  other  point,  we  can  conclude 
that  at  most  2dq  wires  can  cross  the  boundary  of  the  square  at  any  point  By 
rewiring  the  portion  of  T  which  is  outside  the  square,  it  is  possible  to  produce  a 
second  layout  T  ’  for  G  which  has  at  most  2dq  additional  horizontal  tracks  and  2dq 
additional  vertical  tracks.  (One  additional  horizontal  track  and  one  additional 
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vertical  track  are  needed  to  replace  each  wire.)  Thus  the  total  area  of  T  ’  is  at  most 
9d2q2.  (As  an  example  of  how  the  rewiring  should  be  done,  we  have  included 
Figure  5-2.)  □ 


Figure  5-2:  Rewiring  the  outer  portion  of  a  layout . 


It  is  not  difficult  to  construct  jV-node  graphs  with  X7V)'separators  which  have 
logN  diameter  for  any  /TV).  By  Theorem  5-2,  any  layout  of  such  a  graph  must 
have  a  wire  of  length  Q(J[NyiogN).  Using  crossing  number  and  wire  area 
arguments,  however,  we  will  find  examples  of  graphs  which  must  contain  even 
longer  wires.  In  particular,  we  will  describe 

1)  an  jV-node  planar  graph  for  which  any  layout  must  have  a  wire  of  length 

e(N,/2/logI/2N), 

2)  an  A- node  graph  with  an  CKA^^-separator  for  which  any  layout  must  have 
a  wire  of  length  9(7 V,/2logN/loglogN),  and 

3)  an  TV-node  graph  with  an  0(Ar/' ^O-separator  for  which  any  layout  must 
have  a  wire  of  length  Q(N,  ,/r)  for  any  r>3. 

The  latter  two  results  achieve  the  known  upper  bounds  for  maximum  wire 
length.  They  also  indicate  that  some  wires  in  some  layouts  must  be  very  long 
(possibly  as  long  as  the  length  of  the  entire  layout). 

For  convenience,  we  have  summarized  our  edge  length  results  along  with  the 


previously  known  upper  and  lower  bounds  in  Table  5*4.  The  upper  bounds  are 
due  to  Bhatt  and  Leiserson  [BL81]  while  the  lower  bounds  are  all  easy  corollaries 
of  Theorem  5-2. 


Table  5-4 

Maximum  Edge  Length  Bounds 


separator 

previous 
lower  bound 

our 

lower  bound 

upper 

bound 

Na,  a  <  1/2 

N,/2/logN 

Nl/2/logN 

% 

11 

8 

k 

NI/2/logN 

NI/2logN/loglogN 

Nl/2logN/loglogN 

Na,  a  >  1/2 

Na/logN 

Na 

Na 

(planar) 

NI/2/logN 

Nl/2/logI/2N 

NI/2logN/loglogN 
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CHAPTER  6 


NETWORK  CONSTRUCTIONS 


In  this  chapter,  we  will  describe  the  networks  for  which  we  will  later  establish 
layout  area  and  maximum  edge  length  lower  bounds.  As  the  networks  are  new 
and  interesting  in  their  own  right,  we  will  discuss  each  at  some  length. 

6.1  The  2-Dimensional  Mesh  of  Trees 

The  N-node  2-dimensional  mesh  of  trees  will  be  the  first  example  of  a  graph 
with  an  0(/V;/^)-separator  known  to  have  layout  area  Q(Nlog-N)  and  maximum 
edge  length  Q{Nl/2logN/loglogN). 

6.1.1.  Definition 

The  2- dimensional  nxn  mesh  of  trees  Af2tt  (where  n  is  assumed  to  be  a  power  of 
2)  is  defined  as  follows.  Starting  with  an  nxn  matrix  of  nodes  and  adding  nodes 
wherever  necessary,  construct  a  complete  binary  tree  in  each  row  and  column  of 
the  matrix.  The  trees  should  be  constructed  so  that 

1)  the  leaves  in  each  tree  are  precisely  the  nodes  in  the  corresponding  row  or 
column  of  the  original  matrix,  and 

2)  the  subgraph  induced  on  the  nodes  in  each  quadrant  is  M2n/2  • 

For  example,  we  have  drawn  Af24  in  Figure  6-1.  The  nodes  in  the  original  4x4 
matrix  are  represented  by  dots.  The  nodes  which  were  added  in  order  to  form  row 
trees  are  drawn  as  small  triangles  while  those  added  to  form  column  trees  are 
shown  as  small  squares.  The  row  tree  edges  are  drawn  with  solid  lines  while 
dashed  lines  represent  edges  of  column  trees.  Notice  that  if  we  were  to  remove  the 
roots  of  the  row  and  column  trees  of  M24  and  the  edges  incident  to  them,  we 
would  be  left  with  4  copies  of  At 22  ,  one  in  each  quadrant.  In  general,  if  we 
remove  the  nodes  and  edges  in  the  top  k  levels  of  the  binary  trees  in  A 12n  ,  we 
will  be  left  with  2-*  copies  of  A12nTk .  This  important  property  of  meshes  of  trees 
is  used  extensively  throughout  Chapters  7  and  8. 


Figure  6-1:  The  4x4  mesh  of  trees  M24 


6.1.2.  Properties 

It  is  not  difficult  to  show  that  the  nxn  mesh  of  trees  M2n  has 

1)  N  =  3n2-2n  =  Q^n2)  nodes, 

2)  bisection  width  n  =  Q(NI/2)  , 

3)  diameter  4logn  =  Q(logN)  ,  and 

4)  an  0(////2j-separator. 

By  applying  the  methods  discussed  in  Chapter  5,  we  can  thus  conclude  that  the 
7V-node  2-dimensional  mesh  of  trees  has 

1)  crossing  number  at  most  0(Nlog2N), 

2)  layout  area  between  fl(A9  and  0(Nlog2N),  and 

3)  maximum  edge  length  between  Sl(N,/2/logN)  and  ()( N,/2logN/loglogN). 
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In  fact,  we  will  show  in  Chapters  7  and  8  that  the  Af-node  2-dimensional  mesh  of 
trees  has 

1)  crossing  number  Q(NlogN), 

2)  layout  area  Q(Nlo^N),  and 

3)  maximum  edge  length  Q(NI/2IogN/loglogN). 

Thus  the  2-dimensional  mesh  of  trees  is  the  first  graph  with  an  0(NI/2y 
separator  known  to  acheive  the  upper  bound  for  layout  area  discovered  by 
Leiscrson  [L80a]  and  the  upper  bound  for  maximum  edge  length  discovered  by 
Bhatt  and  Leiserson  [BL81]. 

6.1.3  Applications 

Computationally,  the  nxn  mesh  of  trees  is  a  very  powerful  network.  Among 
other  things,  it  can  be  used  to 

1)  multiply  a  fixed  nxn  matrix  by  m  different  n- vectors  in  m+2logn  (word) 
steps, 

2)  sort  a  list  of  n  m-bit  words  in  2m+5logn  (bit)  steps,  and 

3)  link  n  input  terminals  to  n  output  terminals  in  any  order  in  logn  (bit)  steps. 

The  algorithms  and  processors  needed  for  these  operations  are  quite  simple.  For 
example,  the  processors  needed  for  sorting  and  switching  need  only  contain  a  few 
and  and  or  gates  while  those  for  matrix-vector  multiplication  need  only  contain  a 
word  multiplier  or  adder.  We  describe  the  algorithms  needed  for  these  operations 
in  the  following  three  subsections. 

(a)  Matrix-Vector  Multiplication 

Given  any  fixed  nxn  matrix  S=(si.) ,  we  will  show  how  to  program  M2n  to 
compute  the  product  of  5  and  any  m  input  /i- vectors  in  m+2logn  (word)  steps. 
As  S  is  fixed,  it  is  not  considered  to  be  part  of  the  on-line  input  Rather,  it  is 
considered  to  be  part  of  the  program  (in  the  form  of  off-line  input)  and  thus  we 
assume  that  the  value  of  Sy  is  initially  stored  in  the  (i,j)  leaf  of  M 2n  for  each  i  and 
j.  The  algorithm  proceeds  as  follows. 

Given  any  input  vector  v=(v/) ,  input  the  jih  entry  Vj  into  the  root  of  the  jth 


column  tree  for  each  j,  l<j<n .  Pass  the  entries  of  v  down  the  column  trees  so  that 
after  logn  steps,  each  leaf  in  the  jth  column  tree  has  received  the  value  of  Vj  . 
Computation  of  the  n2  products  {SjjVj  \  1  <  i  ,j  <  n]  can  now  take  place  simul¬ 
taneously.  Afterwards,  we  can  find  the  entries  of  the  product  vector  Sv  by 
summing  the  values  of  the  leaves  in  each  row  tree.  This  operation  takes  an 
additional  logn  steps. 

The  total  running  time  of  the  algorithm  just  described  is  l  +  2logn  .  By 
pipelining  the  input  vectors  through  the  column  trees  and  the  output  sums  through 
the  row  trees,  it  is  not  difficult  to  see  that  m  such  products  can  be  calculated  in 
m+2logn  steps. 

(b)  Sorting 

The  algorithm  for  sorting  proceeds  as  follows.  Starting  at  the  roots,  input  (bit  by 
bit)  the  ith  word  to  be  sorted  into  the  ith  row  and  column  trees  for  each  /,  J<i<n. 
Pass  the  bits  down  each  tree  so  that  after  logn  steps  the  leading  bit  of  the  ith  word 
has  reached  each  leaf  of  the  ith  row  and  column  trees.  Comparison  of  the  ith  and 
jth  words  for  all  /  and  j  can  now  proceed  simultaneously.  After  at  most  m 
additional  steps,  the  (ij)  leaf  has  decided  whether  the  ith  word  is  smaller  or  larger 
than  the  jth  word.  Ties  are  broken  arbitrarily  (e.g.,  depending  on  the  values  of  i 
and  j).  Once  this  is  done,  each  leaf  transmits  a  0  or  a  1  to  its  column  tree  father 
depending  on  whether  its  column  tree  word  was  smaller  or  larger  than  its  row  tree 
word.  Each  column  tree  then  sums  these  values  in  order  to  determine  the  position 
of  its  word  in  the  final  ordering.  (If  the  sum  is  carried  out  bit  by  bit  starting  with 
the  least  significant  bit,  this  process  takes  2logn  steps.)  This  information  is  then 
used  to  mark  a  path  in  each  column  tree  from  the  root  to  that  leaf  which  is  also  in 
the  appropriate  row  tree  (again  taking  2logn  steps).  It  is  now  a  simple  matter  to 
transmit  the  bits  of  the  ith  word  along  the  unique  path  from  the  ith  column  tree 
root  to  the  appropriate  row  root  for  each  /.  As  the  paths  are  all  pairwise  disjoint, 
this  process  takes  only  m  +  2logn  steps. 

The  algorithm  just  described  sorts  a  list  of  n  w-bit  numbers  in  2m+  7 logn  steps. 
It  is  a  simple  exercise  to  speed  up  the  alogorithm  to  obtain  the  2m-t-5logn  step 
bound.  We  should  also  point  out  that  this  algorithm  is  similar  to  the  one  described 
by  Muller  and  Freparata  in  [MP75].  The  VLSI  implementation  of  the  algorithm  is 
new,  however,  and  far  superior  to  many  of  the  VLSI  sorting  algorithms  discussed 
by  Thompson  in  his  recent  survey  paper  [181]. 


(c)  Switching 


Given  the  algorithm  just  described  for  sorting,  it  is  dear  how  to  program  M2n  to 
serve  as  a  switching  network  for  n  input  and  output  lines.  For  example,  assume 
that  the  ith  input  line  is  to  be  connected  to  the  jth  output  line  for  some  /  and  j.  In 
order  to  do  this,  we  first  hook  up  the  ith  input  line  to  the  ith  column  root  We 
next  establish  a  path  from  the  root  of  the  ith  cloumn  tree  to  that  leaf  in  the  tree 
which  is  also  in  the  jth  row  tree.  This  can  be  done  by  inspection  of  the  binary 
representation  bj  •  •  •  blogn  of  the  number  j.  More  precisely,  at  the  kth  level  of  the 
binary  tree,  we  branch  left  or  right  depending  on  whether  bk  is  0  or  / 
(respectively).  Lastly,  we  link  the  appropriate  leaf  of  the  jth  row  tree  to  the  root  of 
the  jth  row  tree  and  then  to  the  jth  output  line  (again  taking  logn  steps). 

The  algorithm  just  described  takes  2/ogn  steps  to  link  n  input  lines  to  n  output 
lines  in  any  order.  It  is  not  difficult  to  show  that  if  the  row  tree  connections  are 
hardwired  in  advance  (i.e.,  by  linking  the  root  of  each  row  tree  to  all  of  its  leaves), 
then  tiie  input-output  connections  can  be  properly  made  in  just  logn  steps. 

6.2  The  /^Dimensional  Mesh  of  Trees 

The  N-node  r-dimensional  mesh  of  trees  (for  r>2)  will  be  the  first  example  of  a 
graph  with  an  0(Na)-separator  (for  a>l/2)  known  to  have  maximum  edge  length 
Q(Na). 

6.2.1  Definition 

The  2-dimensional  mesh  of  trees  can  be  easily  generalized  to  higher  dimensions. 
For  example,  the  3-dimensional  nxnxn  mesh  of  trees  M3  n  can  be  constructed  as 
follows.  Starting  with  an  nxnxn  cube  of  nodes  and  adding  nodes  wherever 
necessary,  construct  a  set  of  n2  complete  binary  trees  in  each  of  the  three 
dimensions  of  the  cube.  As  before,  the  trees  should  be  constructed  so  that  the 
leaves  are  precisely  the  nodes  of  the  original  cube  and  so  that  the  subgraph 
induced  on  each  octant  of  nodes  is  M 3  tl/2  .  The  general  r-dimensional  mesh  of 

r 

trees  Mrn  is  formed  from  an  nxnx'-  -  -~xn  hypercube  in  a  similar  manner.  In 
general,  removal  of  the  roots  and  edges  which  are  in  the  top  level  of  the  binary 
trees  will  leave  2r  copies  of  Mrn/2  . 


6.2.2  Properties 

r* 

It  is  easily  observed  that  the  /-dimensional  nxnx . . .  xn  mesh  of  trees  Mr  n  has 
(for  bounded  r) 

\)  N  =  ( r+l)nr  •  rnrl  =  Q(nr)  nodes, 

2)  bisection  width  nrl  =  Q(N/'l/r)  , 

3)  diameter  2rlogn  =  Q(logN)  ,  and 

4)  an  O(A,/'//0-separator. 

Thus  we  can  easily  infer  that  the  /V-node  /-dimensional  mesh  of  trees  has  (for 
bounded  r) 

1)  crossing  number  at  most  0(N2'2/r), 

2)  layout  area  9(Ar2*2/0.  and 

3)  maximum  edge  length  between  Q(Nl'l/r/logN)  and  0(Nl'l/r). 

In  fact,  we  will  show  in  Chapter  7  that  the  graph  has 

1)  crossing  number  0(N2m2/r),  and 

2)  maximum  edge  length  Q(Nl'I/r). 

Thus  die  r dimensional  mesh  of  trees  is  the  first  graph  with  an  0(Wa)-separator 
(for  a>l/2)  known  to  achieve  the  trivial  upper  bound  on  maximum  edge  length. 

6.2.3  Application  to  Matrix  Multiplication 

Computationally,  the  /-dimensional  mesh  of  trees  is  a  very  powerful  network. 
For  example,  Mrn  can  be  used  to  multiply  m  pairs  of  nxn  matrices  in  m+2logn 
(word)  steps.  The  algorithm  is  very  similar  to  the  one  used  by  M2tn  to  compute 
matrix-vector  products.  It  proceeds  as  follows. 

At  each  time  step,  a  pair  of  matrices  is  entered  into  the  network  via  the  roots  of 
the  trees  in  two  of  the  dimensions  (one  dimension  for  each  matrix).  The  entries 
are  passed  down  through  the  trees  so  that  after  logn  steps,  the  leaf  in  the  ( r,s,i ) 
position  of  the  cube  contains  the  ( r,s )  entry  of  the  first  matrix  and  the  ( s,t )  entry  of 
the  second  matrix  for  each  r,s  and  /.  All  n3  multiplications  can  then  be  performed 
simultaneously.  The  entries  of  the  product  matrix  are  then  calculated  by  summing 
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the  values  of  the  leaves  of  each  tree  in  the  third  (previously  unused)  dimension. 
This  process  takes  an  additional  logn  steps.  As  the  network  is  easily  pipelined,  it  is 
clear  that  the  total  computation  time  is  just  m+2logn  (word)  steps. 

6.2.4  A  Further  Generalization 

The  /-dimensional  mesh  of  trees  was  defined  as  a  natural  generalization  of  the 
computationally  powerful  2-dimensional  mesh  of  trees.  Mr  n  can  also  be  viewed  as 
a  generalization  of  the  /-cube,  also  a  very  powerful  communications  network.  For 
example,  Afr2  is  an  /-cube  with  every  edge  replaced  by  a  path  of  length  2.  Viewed 
in  this  light,  the  /-dimensional  mesh  of  trees  motivates  the  definition  of  a  shuffle- 
tree  graph  in  the  same  way  that  the  /-cube  motivates  the  definition  of  the  shuffle- 
exchange  graph.  Although  we  have  yet  to  investigate  this  graph  in  detail,  it  is  quite 
possible  that  it  has  important  applications. 

(As  an  aside,  we  should  caution  the  reader  that  the  asymptotic  estimates  given  in 
section  6.2.2  do  not  necessarily  apply  to  A1r2  since  r  was  assumed  to  be  bounded. 
The  correct  estimates  are  not  difficult  to  work  out,  however.) 

6.3  The  Tree  of  Meshes 

The  V-node  tree  of  meshes  will  be  the  first  example  of  a  planar  graph  known  to 
have  O(NlogN)  layout  area, 

6.3.1  Definition 

The  tree  of  meshes  is  similar  to  the  2-dimensional  mesh  of  trees  in  that  it 
combines  the  structure  of  a  mesh  with  that  of  a  complete  binary  tree  in  a  natural 
way.  Unlike  the  2-dimensional  mesh  of  trees,  however,  the  tree  of  meshes  is  a 
planar  graph.  It  is  formed  by  replacing  each  node  of  a  complete  binary  tree  with  a 
mesh  and  each  edge  by  several  edges  which  link  the  meshes  together.  More 
precisely,  the  root  of  the  binary  tree  is  replaced  by  an  nxn  mesh  (where  n  is 
assumed  to  be  a  power  of  2),  its  sons  are  replaced  by  n/2  x  n  meshes,  their  sons  are 
replaced  by  n/2  x  n/2  meshes,  and  so  on  until  the  leaves  are  replaced  by  ixl 
meshes.  In  the  place  of  each  right  edge  of  the  binary  tree  (i.e.,  one  which  links  a 
node  to  its  right  son),  we  link  the  rightmost  column  of  nodes  in  the  mesh 
corresponding  to  the  father  to  the  topmost  row  of  nodes  in  the  mesh  corresponding 
to  the  right  son.  Similar  replacements  are  made  for  left  edges  of  the  binary  tree.  In 
both  cases,  the  connections  ate  made  so  as  to  preserve  the  column  and  row  order 
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of  the  nodes  and  to  insure  that  the  resulting  graph  will  be  planar.  The  resulting 
graph  is  refered  to  as  the  nxn  tree  of  meshes  and  will  be  denoted  by  Tn  .  For 
example,  we  have  drawn  T4  in  Figure  6-2. 


6.3.2  Properties 

It  is  easily  seen  that  the  nxn  tree  of  meshes  Tn  has 

1)  N  —  2n2logn+n2  =  Q{n2logn)  nodes, 

2)  bisection  width  n  =  Q(Nl/2/logI/2N)  , 

3)  diameter  8n  =  Q(N1/2/logI/2N)  ,  and 

4)  an  0(  N,/2/logl/2N)-sepai&\or. 

Thus  we  can  easily  infer  that  the  A^-node  tree  of  meshes  has 

1)  layout  area  between  tt(N)  and  O(NlogN),  and 

2)  maximum  edge  length  between  Sl(log,/2N)  and  0(Nl/2log,/2N). 
In  fact,  we  will  show  that  the  graph  has 
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1)  layout  area  Q(NlogN)  and 

2)  maximum  edge  length  Q(logN). 

The  maximum  edge  length  bound  is  fairly  straightforward.  We  will  show  in 
Chapter  8  that  the  wire  area  of  the  A-node  tree  of  meshes  is  Q(NlogN).  As  the 
graph  has  0(A)  wires,  we  can  conclude  that  some  of  them  must  have  length  at  least 
Q{logN).  The  lower  bound  can.  in  fact,  be  achieved  by  a  straightforward 
modification  of  the  H*tree  layout  for  binary  trees  [MR79]. 

In  section  6.4,  we  will  show  how  to  augment  the  A-node  tree  of  meshes  so  that 
any  layout  will  have  to  contain  a  wire  of  length  at  least  Q,{NI/2/logl/2N). 

6.3.3  Applications 

The  tree  of  meshes  is  a  particularly  interesting  planar  graph  since  it  can  embed 
arbitrary  planar  graphs  much  more  efficiently  than  can  the  ordinary  mesh.  For 
example,  it  is  not  known  how  to  embed  an  arbitrary  planar  graph  in  less  than  an 
Q(N  log2 N)- node  mesh.  As  we  show  in  part  (a)  of  this  section,  however,  any  A- 
node  planar  graph  can  be  embedded  in  an  0(A/ogA)*node  tree  of  meshes. 

The  tree  of  meshes  can  also  be  used  to  embed  many  nonplanar  graphs  which 
have  0(A//:?)-separators.  For  example,  we  will  show  in  part  (b)  of  this  section  how 
to  embed  M2n  in  hn  for  any  n.  This  result  will  later  allow  us  to  give  a  simple 
proof  that  the  A- node  tree  of  meshes  has  wire  area  at  least  U(NlogN). 

(a)  Embeddings  of  Planar  Graphs 

In  [LT77],  Lipton  and  Taijan  prove  an  0(A//2)-separator  theorem  for  the  class 
of  planar  graphs.  Recently,  Bhatt  and  Leiserson  [BL81]  generalized  this  result  by 
showing  that  the  class  of  planar  graphs  has  an  0(A//?)-simultaneous  separator. 
(An  A-node  graph  G  is  said  to  have  an  J{N)~  simultaneous  separator  if  for  any  2 * 
coloring  (say,  black  and  white)  of  the  nodes  of  G,  there  are  disjoint  subgraphs  G{ 
and  G2  of  G  such  that  G,  and  G2  each  contain  1/2  of  the  black  nodes  and  1/2  of 
the  white  nodes  of  G,  at  most  J{N)  edges  link  Gt  to  G2 ,  and  both  Gj  and  G2  have 
/A/2)-simu1taneous  separators.)  In  the  following  theorem,  we  show  that  any  A- 
node  graph  with  an  0(A//'>)-simultaneous  separator  can  be  embedded  in  an 
O(A/ogA0-node  tree  of  meshes.  As  a  corollary,  we  will  thus  be  able  to  conclude 
that  any  A- node  planar  graph  can  be  embedded  in  an  0(A/ogA)*node  tree  of 
meshes. 


Theorem  6*1:  Every  N-node  graph  with  an  0{Nl/2)- simultaneous  separator  can 
be  embedded  in  an  0(NlogN)-node  tree  of  meshes. 

Proof:  Let  G  be  an  JV-node  graph  with  an  /AO-simultaneous  separator  (fN) 
wili  later  be  chosen  to  be  0(NI/2) ).  Partition  G  into  two  subgraphs  Gj  and  G2  in 
accordance  with  the  usual  separator  theorem.  Color  the  nodes  of  Gt{G2)  white  or 
black  according  to  whether  or  not  they  are  linked  to  a  node  in  G2  ( G /).  (To  be 
precise,  we  should  also  weight  each  node  according  to  the  number  of  nodes  in  the 
other  subgraph  to  which  it  is  adjacent.)  Now  use  the  simultaneous  separator  to 
partition  G t  and  G2  .  Proceed  in  this  manner  until  only  isolated  nodes  remain.  At 
each  step,  color  the  nodes  in  the  subgraph  white  if  they  are  adjacent  to  some  node 
outside  of  the  subgraph  and  black  if  they  are  adjacent  only  to  nodes  within  the 
subgraph. 

After  the  first  step,  at  most  fN)  edges  will  link  each  (AWj-node  subgraph  to  the 
other.  After  the  second  step,  at  most  fN)/2+J{N/2)  edges  will  link  each  (N/4)- 
node  subgraph  to  any  other.  Using  induction,  it  is  not  difficult  to  show  that  after  k 
steps,  at  most 

AN]/2k-1  +  AN/2)/2k-2  +  fN/4)/2k-3  +  +  fi,N/2k-*)/2  +  AN/2hI) 

edges  will  link  each  (N/2k)-node  subgraph  to  any  other.  In  particular,  for  /AO  = 
0(N1/2)  ,  we  can  conclude  that  at  most  0(m//2)  edges  will  link  any  /n-node 
subgraph  produced  by  this  process  to  any  other  subgraph. 

Each  subgraph  produced  by  the  above  procedure  corresponds  in  a  natural  way 
to  a  mesh  of  the  tree  of  meshes.  For  example,  G  corresponds  to  the  root  mesh,  Gj 
and  G2  correspond  to  the  second  level  meshes,  and  so  on.  In  general,  each  m-node 
subgraph  corresponds  to  an  9(/n)-node  mesh.  Thus  each  mesh  can  be  used  as  a 
switching  network  to  embed  the  0(m,/2)  edges  which  link  the  corresponding 
subgraph  to  other  subgraphs.  As  an  example  of  how  this  is  done,  we  have 
included  Figure  6-3.  In  each  switching  network,  the  edges  entering  from  the  top 
are  linked  to  the  edges  entering  from  the  sides.  The  nodes  of  G  are  embedded  in 
the  bottom  levels  of  the  tree  of  meshes  □ 

Corollary  6*1:  Every  N-node  planar  graph  can  be  embedded  in  an  0 (NiogN)-node 
tree  of  meshes. 


Proof:  Obvious  □ 


(b)  Embedding  of  M2n  in  T2a 

Although  we  have  not  worked  out  the  details,  it  appears  likely  that  any  N- node 
graph  with  an  0(A,//:?)-separator  can  be  embedded  in  an  0(NlogN)-node  tree  of 
meshes.  In  section  7.4.3,  we  prove  a  slightly  weaker  result;  namely  that  every  N- 
node  graph  with  an  0(A///2)-separator  can  be  embedded  in  some  0(NlogN)-nodt 
planar  graph. 

Of  particular  importance,  however,  is  the  fact  that  A/2  „  can  be  embedded  in  hn 
for  any  n.  For  example,  consider  die  embedding  of  M24  in  Ts  displayed  in  Figure 
6-4.  The  embedding  has  been  drawn  as  though  it  were  construted  as  part  of  a 
larger  embedding  (say  of  A/2#)  in  order  to  illustrate  the  recursive  nature  of  the 
general  embedding  procedure.  In  addition,  the  nodes  and  edges  of  Af24  have  been 
drawn  as  they  appear  in  Figure  6-1.  For  clarity,  we  have  represented  the  nodes  of 
T8  as  pinpoints  and  omitted  its  edges  altogether.  Also  notice  that  we  have  not 
included  the  bottom  two  levels  of  T8  since  they  are  not  needed  for  the  embedding. 

The  embedding  of  M2n  in  hn  for  arbitrary  n>4  proceeds  as  follows. 

step  1 :  Remove  the  roots  of  the  row  and  column  trees  of  A f2  n  and  all  the  edges 
incident  to  them. 

step  2:  Embed  the  four  copies  of  A/2n/2  obtained  from  step  1  in  four  separate 
copies  of  Tn  by  calling  this  procedure  recursively. 

step  3:  Embed  the  2n  roots  of  the  row  and  column  trees  in  the  2n  x  2n  mesh 
so  that 

1)  the  column  roots  are  located  at  positions  (40  for  /  <  /  <  n/2  and 
3n/2  <  i  <  2/i,  and 

2)  the  row  roots  are  located  at  positions  (2/-/, 2i-I)  and  (2/-/.20  for 
n/4  <  i  <  3n/4  . 

step  4:  Draw  left  and  right  horizontal  edges  from  each  column  root  to  the  left 
and  right  outer  columns  of  the  2n  x  2n  mesh  and  then  to  the  appropriate  node  in 
the  top  row'  of  the  corresponding  n  x  2n  mesh.  Similarly  draw  two  left  edges 
from  each  row  root  with  position  (2/-/, 2/-/)  for  some  /  and  two  right  edges  from 
each  row-  root  with  position  (2/-/.20  for  some  L 

step  5:  The  n  x  2n  meshes  are  used  as  switching  networks.  In  particular,  we 


use  them  to  make  the  following  connections: 


1)  (7,7)  to  ( 47 )  for  1  <i<  n/4  (column  tree  connection) 

2)  (/,/)  to  ( i+n/2 ,  7)  for  n/4<i<n/2  (column  tree  connection) 

3)  (7,  2i- 1)  to  (47)  for  n/4  <  /'  <  3n/4  (row  tree  connection) 

4)  (7,  2i )  to  (4  2ri)  for  n/4  <  i  <  3n/4  (row  tree  connection) 

5)  (7,0  to  (Sn/2  -  i+ 1,  2n)  for  3n/2<i<  7 n/4  (column  tree  connection) 

6)  (7,0  to  (2n-  i+1,  2n)  for  7n/4<i<2n  (column  tree  connection) 

step  6:  Each  n  x  2n  mesh  can  be  easily  linked  to  two  copies  of  Tn  ,  each  of 
which  contains  an  embedding  of  M 2  tl/2  produced  by  this  procedure.  In  particular, 
attach  the  wire  leaving  via  the  ilh  row  of  the  n  x  2n  mesh  to  the  node  in  the  ith 
column  of  the  appropriate  nxn  mesh  of  Tn  for  each  n.  (Note  that  the  nodes  in  the 
nxn  meshes  are  roots  of  M2ll/2  and  will  become  second  level  nodes  of  M2n) 

6.4  The  Augmented  Tree  of  Meshes 

As  we  mentioned  in  section  6.3.2,  the  A-node  tree  of  meshes  can  be  laid  out  so 
that  every  wire  has  length  at  most  O(logN).  By  slightly  modifying  the  graph, 
however,  it  is  possible  to  increase  the  maximum  edge  length  dramatically.  The 
basic  idea  is  to  add  a  complete  binary  tree  with  n 2  leaves  to  the  nxn  tree  of  meshes 
so  that  the  leaves  of  one  are  linked  in  a  one-to-one  fashion  with  the  leaves  of  the 
other.  It  is  important  that  the  attachments  between  the  two  graphs  be  made  so  that 
the  resulting  graph  (which  we  call  the  nxn  augmented  tree  of  meshes  Tn ' )  is  planar. 
For  example,  we  have  drawn  the  4x4  augmented  tree  of  meshes  in  Figure  6-5. 

It  is  easily  seen  that  the  augmented  tree  of  meshes  has,  up  to  a  constant,  the 
same  bisection  width,  diameter,  separator,  layout  area  and  number  of  nodes  as  does 
the  original  tree  of  meshes.  By  adding  the  binary  tree,  we  have  simply  decreased 
the  distance  between  any  two  leaves  of  the  tree  of  meshes.  In  Chapter  8,  we  will 
show  that  any  layout  of  the  A-node  tree  of  meshes  has  two  leaves  which  are  spaced 
at  least  Q(NI/2log,/2N)  apart  We  will  thus  be  able  to  conclude  that  the  maximum 
edge  length  of  Tn'  is  at  least  tt(nlogn)  =  Sl(Nl/2/log,/2N) .  Using  the  techniques 
developed  by  Bhatt  and  Leiserson  in  [BL81J,  it  is  not  difficult  to  show  that  the 
lower  bound  is  attainable. 
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Figure  6*5:  The  4x4  augmented  tree  of  meshes 


CHAPTER  7 


CROSSING  NUMBER  ARGUMENTS 


In  this  chapter,  we  demonstrate  the  power  of  the  crossing  number  as  a  lower 
bound  technique  for  VLSI.  We  commence  by  showing  that  the  crossing  number  is 
at  least  as  large  (up  to  a  constant)  as  the  square  of  the  bisection  width.  In  section 
7.2,  we  describe  a  powerful  method  for  finding  crossing  number  lower  bounds. 
'ITiis  method  is  then  used  in  section  7.3  to  find  light  lower  bounds  on  the  crossing 
numbers  of  a  variety  of  networks.  We  conclude  in  section  7.4  with  a  collection  of 
miscellaneous  results.  Included  arc  additional  upper  and  lower  bounds  for  the 
crossing  number  of  a  network  as  well  as  a  procedure  for  embedding  an  arbitrary 
N-nodc  graph  with  an  CKA^-scparator  in  an  0(A,/ogA,)-node  planar  graph. 

7.1  The  Relationship  Between  Crossing  Number  and  Layout  Area 

We  first  show  that  crossing  number  arguments  are  at  least  as  powerful  as 
bisection  width  arguments  in  establishing  lower  bounds  for  layout  area. 

Theorem  7-1:  If  G  is  an  N-nodc  graph  with  crossing  number  c  and  bisection 
width  b,  then  c  +  N  >  Q(b^). 

Proof:  Let  D  be  a  drawing  of  G  in  the  plane  with  c  crossings.  Replace  each 
crossing  of  D  with  an  artificial  node.  Call  the  resulting  graph  G'  and  note  that  it 
has  precisely  c+N  nodes.  Using  the  weighted  version  of  the  Lipton-Tarjan  planar 
separator  theorem  JLT77],  it  is  possible  to  bisect  the  real  nodes  of  G'  (by  assigning 
weight  /  to  the  real  nodes  and  weight  0  to  the  artificial  nodes)  without  cutting 
more  than  0 ((c+N),/2)  edges.  After  replacing  the  artificial  nodes  with  their 
original  edge  crossings,  it  becomes  apparent  that  we  have,  in  fact,  constructed  an 
0((c  +  N)l/-)  bisection  for  G.  Squaring,  we  find  that  c+N  >  □ 

Using  a  similar  proof  technique,  wc  can  show  that  the  crossing  number  is  also 
close  to  an  upper  bound  for  the  layout  area  of  a  graph.  In  fact,  should  a  really 
goc'd  layout  algorithm  for  planar  graphs  be  found,  then  the  following  result  could 
become  useful  in  laying  out  aibiirary  giaphs. 
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Theorem  7*2:  Given  an  optimal  drawing  D  for  an  N-node  graph  G  with  crossing- 
number  c,  it  is  possible  to  construct  a  layout  for  G  with  area  at  most 
0((c+N)log2{c+N)).  Should  a  procedure  be  found  which  lays  out  an  arbitrary  N- 
nodc  planar  graph  in  A(N)  area,  then  we  could  construct  a  layout  for  G  with  area  at 
most  0(A(c+N)). 

Proof:  As  in  the  proof  of  Theorem  7-1,  we  replace  each  edge  crossing  of  D  with 
an  artificial  node.  The  resulting  graph  G'  has  c+N  nodes  and  is  planar.  Using 
the  methods  developed  by  Lipton  and  Tarjan  [LT77]  and  Leiserson  [L80a],  G'  can 
be  laid  out  in  Odc+tylog-ic+N))  area.  It  is  then  a  simple  matter  to  replace  the 
artificial  nodes  with  their  original  edge  crossings  to  obtain  the  desired  layout  for  G. 
Alternatively,  should  an  /f(A)-area  planar  graph  layout  procedure  be  discovered, 
we  could  construct  an  0(A(c+  N))-nre'd  layout  for  G  □ 

As  we  have  just  seen,  the  idea  of  replacing  edge  crossings  with  artificial  nodes  is 
simple  but  powerful.  Jai-Wei  and  Rosenberg  have  also  employed  this  strategy  in 
their  work  with  embeddings  of  graphs  in  binary  trees  [JR81], 

7.2  A  General  Method  for  Proving  Lower  Bounds 

In  this  section,  we  will  describe  a  general  method  for  proving  crossing  number 
lower  bounds.  A  variant  of  this  method  will  later  be  used  to  prove  lower  bounds 
for  bisection  width  and  wire  area.  The  basic  idea  is  as  follows. 

Given  a  drawing  D  for  an  A-node  graph  G,  wc  will  construct  a  drawing  D '  for 
the  complete  graph  on  A  nodes  A'jV  by  tracingWer  the  edges  of  D.  For  example, 
we  have  done  this  for  the  V-node  graph  shown  in  Figure  7-1.  The  edges  of  the 
original  graph  are  drawn  with  dashed  lines  while  solid  lines  indicate  edges  of  K4  . 

If  we  are  careful  not  to  trace  over  each  edge  of  D  too  many  times  during  the 
construction  of  D  \  it  may  be  possible  to  infer  something  about  the  number  of 
crossings  in  D  by  counting  the  number  of  crossings  in  D  \  This  is  due  to  the  fact 
that  the  number  of  crossings  in  l)  is  closely  related  to  the  number  of  crossings  in 
D ' .  For  example,  if  ct  and  c2  arc  edges  of  (7  which  cross  in  l)  and  f/  is  traced 
over  sf  times  while  c2  is  traced  over  s2  times,  then  the  crossing  of  C/  with  e2  will 
appear  S/S ,  times  in  D  \  Such  a  crossing  of  l)  ’  is  called  a  crossing  of  theftrst  kind. 
For  example,  there  are  four  crossings  of  the  first  kind  in  the  drawing  of  K4  in 
Figure  7-1. 
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crosstngs  of 
the  first  kind 


crossing  of  the 
second  kind 


Figure  7-1 :  Construction  of  K4  from  the  drawing  of  a  4- node  graph. 


Sometimes,  it  is  necessary  for  two  edges  of  D  '  to  cross  while  traversing  the  same 
edge  of  D.  Such  a  crossing  is  called  a  crossing  of  the  second  kind.  Note  that  there 
is  only  one  crossing  of  the  second  kind  in  the  drawing  of  K4  in  Figure  7-1.  Since 
D  ’  can  easily  be  drawn  so  that  no  pair  of  edges  cross  each  other  more  than  once, 
there  are  usually  not  very  many  crossings  of  the  second  kind.  More  precisely,  if  G 
has  edges  e7 . ek  and  if  edge  ei  is  traced  over  si  times  for  each  /  during  the 

construction  of/)',  then  D'  can  have  at  most  2*s2/2  crossings  of  the  second 

C«| 

kind.  For  most  applications  of  the  method,  this  number  is  substantially  smaller 
than  the  number  of  crossings  of  the  first  kind  in  D  ’  and  thus  we  usually  do  not 
have  to  worry  about  crossings  of  the  second  kind. 

By  showing  that  the  number  of  crossings  in  D  ’  is  large,  we  can  conclude  that 
there  must  be  a  large  number  of  crossings  in  D.  For  example,  if  each  edge  of  D  is 
traced  over  at  most  s  times  during  the  construction  of  D '  and  D '  is  found  to  have 
y  crossings,  then  we  can  conclude  that  D  has  at  least  y/s2  crossings.  This  follows 
from  the  fact  that  each  crossing  of  D  is  replicated  at  most  s 2  times  in  D (Note 
that  we  have  neglected  crossings  of  the  second  kind  in  this  argument.) 

Fortunately,  it  is  easy  to  find  a  good  lower  bound  on  the  number  of  crossings  in 
an>  drawing  of  KN  .  We  state  the  result  formally  in  the  following  lemma.  The 
proof  can  also  be  found  in  Kleiiman’s  work  [K70]  but  is  generally  regarded  as 
folklore. 


Lemma  7*1  (Kleitman  [K70]):  The  crossing  number  of  KN ,  the  complete  graph 
on  N  nodes,  is  at  least  N(N-I)(N-2)(N-3)/I20  for  N>5. 

Proof:  Let  D  be  a  drawing  of  in  the  plane  with  the  smal  ^st  possible  number 
of  crossings  c(N).  We  may  assume  that  no  pair  of  edges  which  cross  in  D  are 
incident  to  a  common  node.  Otherwise,  it  would  be  possible  to  produce  a  drawing 
D'  for  Kn  with  c(N)-l  crossings  by  exchanging  the  parts  of  the  crossing  edges 
which  lie  between  the  common  node  and  the  point  of  crossing.  This  would 
contradict  the  minimality  of  c(N). 

Consider  the  N  subdrawings  of  D  obtained  by  deleting  one  of  the  nodes  and  all 
of  the  edges  incident  to  it.  Note  that  each  crossing  of  D  appears  in  precisely  N-4 
of  the  subdrawings.  (A  crossing  does  not  appear  in  any  of  the  4  subdrawings 
which  correspond  to  the  deletion  of  a  node  incident  to  an  edge  of  the  crossing.) 
Since  each  of  the  subdrawings  is  a  drawing  of  KN.t,  each  must  have  at  least  c(N-l ) 
crossings.  Thus  (N-4)c(N)  >  Nc(N-l)  .  Applying  the  inequality  recursively  and 
noting  that  c(5)  =  l,  we  can  conclude  that 

c(N)  >  [N/(N-4)]  {(N-iy(N-5)]  ■  •  •  [6/2] 

=  N(N-I)(N-2)(N-3)/I20  for  N>5  □ 

7.3  Applications 

Using  the  technique  described  in.  the  previous  section,  it  is  possible  to  prove 
crossing  number  lower  bounds  for  a  variety  of  networks.  In  particular,  we  will 
prove  lower  bounds  for  the  shuffle-exchange  graph,  the  2-dimensional  mesh  of 
trees  and  the  ^dimensional  mesh  of  trees.  We  commence  with  the  shuffle- 
exchange  graph. 

7.3.1  Lower  Bounds  for  the  Shuffle-Exchange  Graph 

Our  main  result  in  this  section  is  the  following. 

Theorem  7-3:  The  crossing  number  of  the  N-nodc  shuffle- exchange  graph  is 
O  (N2/log2N). 

Proof:  As  we  showed  in  Part  I  of  the  thesis,  the  /V-node  sliufflc-cxhange  graph 
has  layout  area  O (N-/log-N).  Thus  Q(N-/log-N)  is  an  upper  bound  for  the 
crossing  number.  In  what  follows,  we  will  use  the  method  of  section  7.2  in  order  to 


show  that  the  crossing  number -of  the  N- node  shuffle-exchange  graph  is  at  least 
Q(N-/log2N). 

Let  D  be  any  drawing  of  the  yV-node  shuffle-exchange  graph  G  where  N=2k. 
We  first  show  how  to  construct  a  drawing  D '  of  KN  on  the  nodes  of  G  without 
tracing  over  any  edge  of  D  more  than  NlogN  times. 

Given  any  pair  of  nodes  ak  •  •  •  a2  and  bk  •  •  •  b/  ,  draw  the  edge  from 
ak  •  •  •  aj  to  bk  •  •  •  bj  along  the  path 


°k-  ‘  ■  aJa2al  *  ak  ‘  •  •  a3a2bl 
b2biak  •■■a3  »  •  •  •  ~ 


bjak  •  •  •  a^2  *  biak  ‘  •  •  a3b2  ~~ 

bk-i  ■  •  •  b2blbk  ^  bkbk-l  •  •  •  b2bl  • 


(In  order  that  every  edge  of  KN  not  be  drawn  twice,  we  should  assume  that  the 
value  of  ak  •  •  •  aj  is  less  than  that  of  bk  •  •  •  bt  but  this  has  no  bearing  on  the 
argument.) 

Wherever  ai=bi  for  some  /,  the  preceding  path  will  have  a  loop.  When  actually 
draw  ing  the  edges  of  D we  ignore  such  loops.  For  example,  the  edge  from  0/100 
to  1II01  is  drawn  along  the  path 


llOH 


11101. 


For  convenience,  we  have  labeled  the  shuffle  edges  with  an 


and  the 


exchange  edges  with  an 
01011  and  1 0101  . 


.  Note  also  that  we  have  omitted  loops  at  10110 , 


It  is  not  difficult  to  show  that  every  edge  of  D  is  traced  over  at  most  NlogN 
times  during  the  construction  of  D\  For  example,  consider  the  shuffle  edge 
linking  ak  •  •  •  to  a2ak  •  •  •  a2  .  It  is  traced  over  during  the  construction  of 
edges  of  D'  which  link  a  node  of  the  form 


ak-i+l‘"a2 *•  •  •* 


to  a  node  of  the  form 


*  *  •  •  *  a!°k  •  •  •  ak-i+  2 


for  any  /,  1  <i<k  (where  *  indicates  either  a  0-bit  or  a  /-bit).  It  is  easily  seen  that 
there  arc  at  most  k2k  such  edges  in  D  'and  thus  each  shuffle  edge  is  traced  over  at 
most  NlogN  times.  A  similar  argument  shows  that  each  exchange  edge  is  al<o 


traced  over  at  most  NlogN  times. 

Since  each  edge  is  traced  over  at  most  NlogN  times,  there  can  be  at  most 
(3N/2)[(NlogN)2/2)  =  3N3/{4log*N) 

crossings  of  the  second  kind  in  D'.  This  is  substantially  less  than  total  number 
Q(N4)  of  crossings  in  D Thus  D  *  must  have  Q(N*)  crossings  of  the  first  kind. 
As  each  edge  of  D  is  traced  over  at  most  NlogN  times,  this  means  that  D  has  at 
least  Sl(N4 /(NlogN)2)  =  Q(N2/logrN)  crossings  □ 

As  the  N-node  shuffle-exchange  graph  has  Q(N)  edges,  we  can  conclude  from 
Theorem  7-1  that  some  edge  of  any  layout  for  the  graph  must  cross  at  least 
tt(N/log*N)  other  edges.  We  do  not  know  whether  or  not  this  bound  can  be 
achieved,  however.  The  only  known  layouts  for  the  N-node  shuffle-exchange 
graph  have  edges  which  cross  at  least  U(N/logN)  other  edges. 

It  is  also  worth  pointing  out  that  the  preceding  argument  can  be  used  to  prove 
that  the  A'-node  shuffle-exchange  graph  has  bisection  width  at  least  U(N/logN). 
The  result  follows  from  the  observation  that  Kn  has  bisection  width  Q(N2)  and  the 
fact  that  every  edge  of  D  was  traced  over  at  most  NlogN  times  during  the 
construction  of  D\  This  means  that  the  bisection  width  of  the  7V-node  shuffle- 
exchange  graph  is  at  least  Sl(N2 /(NlogN))  =  Q(N/logN),  as  claimed. 

In  fact,  a  similar  modification  of  the  method  described  in  section  7.2  can  be  used 
to  find  tight  bisection  width  lower  bounds  for  all  of  the  networks  we  have 
investigated.  For  most  of  these  networks,  however,  it  is  much  more  useful  to  study 
the  corresponding  crossing  number  and  wire  area  bounds. 

7.3.2  Lower  Bounds  for  the  2-Dimensional  Mesh  of  Trees 

In  this  section,  we  use  a  more  sophisticated  version  of  the  method  of  section  7.2 
to  prove  a  nontrivial  lower  bound  on  the  crossing  number  of  the  2-dimensional 
mesh  of  trees. 

Theorem  7-4:  The  crossing  number  of  the  N-node  2-dimensional  mesh  of  trees  is 
at  least  U(NlogN). 

Proof:  As  before,  let  M2n  denote  the  2-dimensional  mesh  of  trees  (where  n  is  a 
power  of  2).  We  will  show  that  the  crossing  number  of  Sf2))  is  at  least 


(n2logn-  121n2+  121n)/40  for  all  n>l. 

Since  M2n  has  N=0(n2)  nodes,  this  will  be  sufficient  to  prove  the  desired  result. 

The  proof  consists  of  two  steps.  In  the  first,  we  show  how  to  construct  a  drawing 
of  Kn:  from  any  drawing  of  hi2n  by  tracing  over  the  edges  of  M2n  .  We  then 
apply  Lemma  7-1  to  conclude  that  there  are  a  large  number  of  crossings  among  the 
edges  in  the  top  levels  of  the  binary  trees  of  M2n  .  In  the  second  step,  we 
complete  the  proof  by  inductively  applying  the  result  of  the  first  step. 

step  1:  Let  D  be  any  drawing  of  M 2  n  in  the  plane.  From  this  drawing,  we  can 
construct  a  drawing  /)'  of  Kn:  in  the  following  wfay.  First  locate  the  n 2  leaves  of 
the  binary  trees  of  D.  They  will  serve  as  the  nodes  for  Kn2  .  Given  any  pair  (i,j) 
and  ( k,l)  of  these  nodes,  draw  an  edge  from  (4y)  to  ( k,I)  along  the  unique  path 
from  (4y)  to  (4/)  in  the  ith  row  tree  of  D  and  then  from  (i,f)  to  (k,l)  in  the  hh 
column  tree  of  D.  (In  order  that  each  edge  not  be  drawn  twice,  we  shall  assume 
that  i<k  and,  when  i=k,  that  j<l)  As  usual,  w'e  assume  that  the  edges  of  D '  are 
drawn  so  that  no  pair  cross  each  other  more  than  once. 

We  next  count  the  number  of  crossings  of  the  second  kind  in  D  '.  In  order  to 
do  this,  we  need  to  count  the  number  of  times  each  edge  of  D  is  traced  over  during 
the  construction  of  D  ’.  It  is  not  difficult  to  show  that  each  edge  in  the  ith  level  of 
a  binary  tree  of  M2n  (henceforth,  referred  to  as  a  type  i  edge)  is  traced  over  at  most 

n?' (n2 ;  n2?)  <  n3T' 

times  for  any  i<logn  during  the  construction  of  D' .  Thus  at  most  n6X2t’1  crosses 
of  the  second  kind  can  occur  at  any  type  i  edge  of  D.  Since  there  are  2'+,n  type  / 
edges  in  M2  n  ,  we  can  conclude  that  the  total  number  of  crosses  of  the  second  kind 
in  D'  is  at  most 

'2 (2i+,n)(n62-2H)  =  n <  n7 . 

i*»  Lt> 

We  next  count  the  number  of  crossings  of  the  first  kind  (i.e.,  those 
corresponding  to  crosses  in  D).  We  say  that  a  crossing  of  D  is  type  i-j  if  it  is  the 
crossing  of  a  type  /  edge  and  a  type  j  edge.  Let  denote  the  number  of  type  i-j 
crossings  in  D  and  set  /.=  Af.. .  Since  each  type  i  edge  is  traced  over  at  most  n3Xl 
times,  each  type  i-j  crossing  of  D  produces  at  most  (n  2  ')(tr2'f)  =  tr2‘1'1  crosses 
of  the  first  kind  in  O' .  Thus  the  total  number  of  crossings  of  the  first  kind  in  f)' 


10 


is  ac  most 


Summing,  we  find  that  the  total  number  of  crossings  of  either  kind  in  D'  is  at 

/r»/> 

most  n7+n62lT2itj  .  By  Lemma  7-1,  this  number  must  be  at  least 
n2(n2-l)(n2-2)(n2-3)/120  for  n2>5.  Simplifying,  we  can  conclude  that 


%T2it{  >  ( n2-12ln)/l20  for  n>6. 
i** 

K 

Let  be  the  number  of  crossings  involving  at  least  one  edge  from  the 

top  k  levels  of  some  binary  tree  of  A f2n  .  In  what  follows,  we  will  use  the 
preceding  inequality  to  show  that  sk  >  ( n2-12In)k/40  for  at  least  some  value  of 
k>l.  Assume  otherwise  and  observe  that 


IcjA  /•<£ 

=  ±r»(srsH) 

i*  t  i*  i 

where  s0  is  defined  to  be  0.  The  coefficient  of  each  s,  (i=0)  in  this  sum  is  I2i-?2i’ 
2  which  is  positive  so  for  each  /  we  may  substitute  (n2-121n)i/40  as  an  upper 
bound  for  in  order  to  see  that 


<  [(n2-ni»V4o\  !>[k/-/)1 

i > i  t*  > 

=  [( n2-l2In)/40 J  2V'  . 

i>  • 

/•in 

Since  2^"'  <  1/3  for  all  ft,  we  can  conclude  that 

%2-2iti  <  ( n2-121n)/]20  for  all  n>121, 

£*t 

a  contradiction.  Thus  for  all  n>121,  there  is  a  k>l  such  that  sk  >  ( n2'121n)k/40 . 


step  2:  Let  c(n)  denote  the  crossing  number  of  M2n  .  Using  the  result  of  step  1, 
we  will  now  show  by  induction  on  n  that  c(n)  >  (n2logn  -  I2ln2+  l2ln)/40  for  all 
/!>/. 

As  ( n2logn  -  12] n2 + 12ln)/40  is  nonpositive  for  small  n,  the  lower  bound 
trivially  holds  for  all  rK/28.  Assume  that  the  lower  bound  holds  for  all  nKn  where 
n>!2S  and  let  D  be  any  drawing  for  M2n  .  By  counting  the  crossings  of  1)  in  two 
groups  according  to  whether  or  not  at  least  one  edge  of  the  crossing  is  contained  in 
the  top  k  levels  of  the  binary  trees  of  .  we  cun  observe  that 


1 


1 

! 

* 


} 


c(ti)  >  22kc(n2'k)  +  sk. 

(Recall  the  definition  of  sk  and  the  structure  of  M2n  .)  By  choosing  k  as  in  step  1 
so  that  sk  >  ( n2-J2Jn)k/40  and  applying  the  inductive  hypothesis  for  c{nTk ),  we 
obtain 

c{n)  >  22k[n22r2k(Iogn-k)/40-  121n2T2k/40+ 121nIk/40\  +  n2k/40  -  121nk/40 

>  n2logn/40  -  121n2/40  +  I2In/40  +  !2/n(2k-k-l)/40 

>  ( n2logn  •  121n2  +  12ln)/40  . 

Thus  the  inductive  hypothesis  is  established  and  we  can  conclude  that  the 
crossing  number  of  Af 2n  is  at  least  tl(n2logri)  =  Q(NlogN)  □ 

In  section  7.4.3,  we  will  show  that  the  crossing  number  of  any  7V-node  graph 
with  an  0(Ar//2)-separator  is  at  most  O(NlogN).  Thus,  we  will  be  able  to  conclude 
that  the  crossing  number  of  the  N-node  2-dimensional  mesh  of  trees  is  precisely 
Q(NlogN). 

7.3.3  Lower  Bounds  for  the  r-Dimcnsional  Mesh  of  Trees 

By  modifying  the  proof  of  Theorem  7-4,  it  can  be  shown  that  any  layout  of  the 
r-dimcnsional  mesh  of  trees  must  have  very  long  wires.  In  particular,  they  must  be 
as  long  as  the  width  of  any  optimal  layout  for  the  graph.  We  state  this  result  more 
precisely  in  the  following  theorem. 

Theorem  7-5:  Any  drawing  of  the  N-node  r-dimensional  mesh  of  trees  contains 
an  edge  which  crosses  at  least  Sl{Nf'//r)  other  edges. 

y 

Proof:  The  ^dimensional  nxnx  •  ■  •  xn  mesh  of  trees  Mrn  has 

N  =  ( r+  !)nr  -  mrl  =  0(/?O  nodes  for  bounded  r.  We  will  show  that  any  layout 
D  of  Mr  n  contains  an  edge  which  crosses  at  least  U(nrl)  =  Sl(Nl',/r)  other  edges, 
thus  proving  the  theorem.  The  method  used  is  very  similar  to  that  of  Theorem  7-4. 

As  we  did  for  the  case  of  r-  2  in  Theorem  7-4,  we  first  construct  a  drawing  D '  of 
the  complete  graph  on  the  nT  leaves  of  l\fr  n  .  Each  type  /  edge  of  D  is  traced  over 
at  most  nr+ 1 ?'  times  by  this  procedure.  Thus  the  total  number  of  crossings  in  D' 
is  at  most 

{rniT+l)/2  +  n2M  2%2r2iti 
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where,  as  before,  U=pflij  an(*  hj  1S  number  of  type  i-j  crossings  in  D. 
Applying  Lemma  7-1,  we  can  conclude  that  ^2'^'f  >  &(n2''2)  . 

it  i 

Let  sk  =  j>]/(  be  the  total  number  of  crossings  of  D  involving  an  edge  from  the 
top  k  levels  of  the  binary  trees  in  Mr  n  .  Using  arguments  similar  to  those  used  to 
prove  Theorem  7-4,  it  is  not  difficult  to  show  that  for  large  n,  there  exists  a  k  such 
that  sk  >  ti(n2l*22k)  .  As  there  are  only  mr'I(2k+  ,-2)  edges  in  the  top  k  levels  of 
Mrn  for  any  k,  we  can  conclude  that  at  least  one  of  them  crosses  at  least  ti(nrI) 
other  edges  □ 

It  is  worth  pointing  out  that  the  preceding  arguments  can  also  be  used  to  show 
that  the  crossing  number  of  the  N- node  ^dimensional  mesh  of  trees  is  Q(N2'2/r) 
for  bounded  r>2. 

7.4  Further  Methods 

In  this  section,  we  describe  some  additional  methods  for  proving  crossing 
number  bounds.  We  first  generalize  Lemma  7-1  to  prove  a  combinatorial  lower 
bound  on  the  crossing  number  of  any  N-node  graph  with  at  least  4N  edges.  This 
result  is  then  used  in  section  7.4.2  to  prove  crossing  number  lower  bounds  for  a 
class  of  graphs  which  are  similar  to  the  2-dimensional  mesh  of  trees.  We  conclude 
by  proving  a  nontrivial  upper  bound  on  the  crossing  number  of  graphs  which  have 
0(A///2)-separators.  As  a  corollary,  we  wiil  show  that  any  A^-node  graph  with  an 
0(A///:?)-separator  can  be  embedded  in  some  O(A7ogA0-node  planar  graph,  thus 
generalizing  Theorem  6-1. 

7.4.1  A  Combinatorial  Lower  Round  for  Crossing  Numbers 

In  this  section,  we  substantially  generalize  the  result  of  Lemma  7-1. 
Throughout,  we  assume  that  G  is  a  simple  graph  (i.e.,  that  it  has  no  loops  or 
multiple  edges). 

Theorem  7-6:  If  G  is  a  graph  with  E  edges  and  N  nodes  where  E>4N,  then  the 
crossing  number  of  G  is  at  least  E2/375N2< 

Proof:  The  proof  is  by  induction  on  N.  For  N-  /,  the  result  is  vacuously  true. 
Assume  that  the  result  is  true  for  all  N  '<N  where  N>1  and  let  G  be  a  graph  with 
N  nodes  and  E  edges  where  E>4N .  We  will  show  that  the  crossing  number  c  of 
(7  is  at  least  E*/375N2,  thus  proving  the  theorem.  There  are  two  cases  to 


consider. 


case  1:  4N  <  E  <  5N  . 

We  first  use  Euler's  formula  [BLW76]  in  order  to  show  that  the  genus  of  G  is 
large.  Euler's  formula  states  that 

E+2=N  +  f+2g 

where  /  is  the  number  of  faces  of  any  proper  embedding  of  G  on  a  surface  of 
genus  g.  Since  G  has  no  loops  or  multiple  edges,  every  face  contains  at  least  3 
edges  and  thus  3/<2E.  Substituting,  we  find  that 

2g  =  E  +  2  -  N  -  f 
>  E  +  2  -  N  -  ( 2E/3 ) 

=  E/3+2  -  N 

and  thus  that  g  >  {E-3N)/6  .  For  4N  <  E  <  5N  ,  it  is  not  difficult  to  show  that 
(E-3N)/6  >E3/375N2  and  thus  that  g  >  E3/375N2. 

Given  any  graph  with  crossing  number  c,  it  is  possible  to  find  a  proper 
embedding  of  the  graph  on  a  surface  with  genus  c.  We  can  do  this  by  drawing  the 
graph  on  a  sphere  so  that  only  c  pairs  of  edges  cross  and  then  putting  a  "handle" 
in  the  region  immediately  surrounding  each  crossing.  The  edges  of  the  crossing 
can  then  be  redrawn  through  the  handle  so  that  they  no  longer  cross.  As  the 
resulting  surface  has  genus  c,  we  can  conclude  that  g<c  for  any  graph  with  genus  g 
and  crossing  number  c.  In  particular,  we  can  conclude  that  c  >  E3/375N2  for  G. 

case  2:  E  >  5N  . 

Let  df . dpj  be  the  degrees  of  the  N  nodes  of  G  and  let  D  be  an  optimal 

drawing  of  G.  As  usual,  we  can  assume  that  no  pair  of  edges  which  cross  in  D  are 
incident  to  the  same  node  of  G.  Consider  the  subdrawing  D f  of  D  obtained  by 
deleting  the  ith  node  of  G  and  all  the  edges  incident  to  it  This  subdrawing  is  also 
a  drawing  of  a  graph  with  N-l  nodes  and  E-d(  edges.  Since  E>5N  and  dt<N-l,  we 
can  conclude  that 


£■-  dj  >  4N+  /  >  4(N-l ). 

Tims  we  can  apply  the  inductive  hypothesis  to  Dt  in  order  to  conclude  that  it  has  at 


least  (E-dt)3/[375(N-I)^\  crossings. ' 

Each  crossing  of  D  will  appear  in  precisely  N-4  of  the  N  subdrawings  of  D 
produced  by  the  above  procedure.  Applying  the  technique  used  to  prove  Lemma 
7-1,  we  can  thus  conclude  that 

c  >  [l/(N-4)\  %  (E-d^WS^N-l)2) 

l-  » 

=  [1/375{N-4){N-1)2)  2(E3  -  3E2dt  +  3Edt2  ■  d3) 

• 

=  [1/375(N-4){N-1)2 1  [£%  -  3E2(2E)  +  %{3Ed?  -  d?)]  . 

<*■»  t 

AJ  A/ 

Since  2E  =  S4.  it  is  not  difficult  to  show  that  'zX3Edi2-di3)  attains  its 

«=<  f-' 

minimal  value  when  d ,•  =  2E/N  for  I<i<N  .  At  this  point, 

QvEdf-dj3)  >  J2E3/N  -  8E3/N2 

CZf 

and  thus 

c  >  (E3N  ■  6E 3  +  I2E3/N  ■  8E3/N2)  /  [375{N3  -  6N2 +9N  -  4)]  . 

For  N>2,  this  expression  can  easily  be  reduced  to  show  that  c  >  E3/375N 2  □ 

It  is  interesting  to  note  that  the  lower  bound  proved  in  Theorem  7-6  is  (up  to  a 
constant)  tight.  For  example,  the  N-nodc  graph  consisting  of  N2/E  disjoint  copies 
of  Ke/n  has  0(E)  edges  and  crossing'  number  at  most  0(E3/N2)  for  any  E>4N. 

7.4.2  Applications 

When  defining  the  2-dimensional  mesh  of  trees,  we  required  that  the  binary 
trees  be  interconnected  so  that  Sf2n  contain  22k  disjoint  copies  of  M2.nl1  35 
subgraphs  for  any  k.  Not  only  is  this  definition  the  most  natural,  but  it  also  allows 
us  to  use  induction  in  the  lower  bound  proofs  for  the  network.  Surprisingly, 
however,  the  constraint  is  not  necessary  in  order  to  show  that  M2n  can  perform 
matrix-vector  multiplication,  sorting  or  switching  in  0 (logn)  time.  In  fact,  any 
network  consisting  of  n  row  trees  and  n  column  trees  which  share  the  same  set  of 
leaves  can  do  these  operations  quickly.  Thus  it  is  conceivable  that  some  other 
arrangement  of  the  tree  interconnections  might  lead  to  a  network  with  a  smaller 
crossing  number,  in  what  follows,  we  use  Theorem  7-6  to  show  that  this  is  not  the 


Theorem  7-7:  IfG  is  an  N-node  graph  formed  in  the  same  way  as  the  nxn  mesh 
of  trees  except  that  arbitrary  interconnections  are  allowed  between  the  leaves  Oj 
binary  trees,  then  G  must  have  crossing  number  at  least  fl(NlogM). 

Proof:  Let  Gk  denote  the  subgraph  of  G  obtained  by  deleting  the  nodes  and 
edges  in  the  top  k  levels  of  the  binary  trees  of  G  for  0<k<logn.  For  example,  if 
G—M2,,  ,  then  Gk  consists  of  22k  disjointcopies  of  M2  njk  .  Otherwise,  Gk  is  a 
graph  for  which  each  node  of  the  original  nxn  matrix  of  nodes  is  a  leaf  of  a 
horizontal  complete  binary  tree  of  depth  logn  -  k  and  a  leaf  of  a  vertical  complete 
binary  tree  of  depth  logn  -  k  .  For  each  k,  let  Hk  denote  the  graph  whose  nodes 
are  the  n2  leaves  of  Gk  and  whose  edges  are  the  paths  in  Gk  of  the  form 

leaf-  path  in  horizontal  binary  tree  -  leaf-  path  in  vertical  binary  tree  -  leaf. 

Note  that  if  G—M2„  ,  then  Hk  consists  of  22k  disjoint  copies  of  Kn:2-2k  .  In  any 
case,  Hk  is  a  regular  graph  for  which  each  node  has  degree  n22'2k-I  . 

Given  any  drawing  Dk  of  Gk  ,  it  is  easy  to  construct  a  drawing  Dk'  for  Uk  by 
tracing  over  the  edges  of  Gk  in  the  natural  way.  It  is  not  difficult  to  see  that  each 
type  i  edge  of  G  is  traced  over  at  most  (2loS"'k)-i2'(''lc)  =  n2I2k'1  times  by  this 
procedure  for  i>k.  Thus  each  type  if  crossing  is  reproduced  at  most  n624k'ki  < 
n6T4k~h  times  for  j  >  i  >  k  . 

Given  any  drawing  D  of  G,  construct  26k  separate  drawings  Dk '  of  f/k  for  each 
k>0.  Each  type  if  crossing  of  D  will  appear  a  total  of 

=  n622i  ft22k 

<  0  (n6) 

times  in  these  drawings.  In  what  follows,  we  will  show  that  there  are  at  least 
i2{nslogn)  total  crossings  of  the  first  kind  in  these  drawings.  We  w  ill  thus  be  able 
to  conclude  that  the  crossing  number  of  G  is  at  least  Qfrlogn). 

As  llk  has  Ek  -  0(n4T2k)  edges  and  Nk  =  n2  nodes,  we  can  apply  Theorem 
7-6  to  conclude  that  Dk  has  at  least  ^.{E^/Nj2)  -  ii{ns2'6k)  crossings.  Thus 
there  arc  at  least  il(ns)  crossings  among  the  26k  drawings  Dk'  .  Summing  over  K 
for  0<k<logn,  we  find  that  there  are  at  least  V.{nslogn)  total  crossings  among  all  of 
the  drawings  \Dk' |  ()<k<logn  }.  It  is  not  difficult  to  check  that  there  are  at  most 
O [n7 2'^)  crossings  of  the  second  kind  in  each  drawing  of  If  .  As  there  are 
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such  drawings  for  each  k ,  we  can  conclude  that  there  are  at  most 

*£(n72T5*)2*k  <  0(n*) 

HSf 

total  crossings  of  the  second  kind  in  all  the  drawings  {Z)*'|  0<k<logn  }.  Thus 
there  are  at  least  Q(n8logn )  total  crossings  of  the  first  kind  and  the  crossing  number 
of  G  is  at  least  Q((n8logn)/n6)  =  Q(n2logn)  =  Q(NlogN)  □ 

As  a  corollary,  we  can  see  once  again  that  the  crossing  number  of  M2  n  is  at  least 
tyNlogtN). 

7.4.3  An  Upper  Bound  for  Crossing  Numbers 

Since  any  N-node  graph  with  an  0(Wa)-separator  for  some  a>l/2  has  an 
0(A^a)-area  layout,  we  can  easily  see  that  it  also  has  crossing  number  at  most 
0(N2a).  By  Theorem  7-1,  we  can  conclude  that  this  bound  is  tight  since  many 
such  graphs  also  have  bisection  width  at  least  fl(Na). 

The  situation  is  not  as  clear  for  graphs  with  0(N//:?)-separators,  however.  For 
example,  the  best  known  upper  bound  on  the  layout  area  of  an  N-node  graph  with 
an  CKA^-J-separator  is  O(Nlo^N)  yet  no  such  graph  is  known  to  have  a  crossing 
number  greater  than  Q(NlogN).  In  what  follows,  we  prove  a  tight  upper  bound  on 
the  crossing  number  of  any  such  graph. 

Theorem  7-8:  The  crossing  number  of  any  N-node  graph  with  an  0(N,/2)- 
separator  is  at  most  O(NlogN). 

Proof:  Given  such  a  graph  G,  we  will  construct  a  drawing  for  G  with  at  most 
O(NlogN)  crossings.  Jn  order  to  construct  the  drawing,  we  will 

1)  decompose  G  into  subgraphs  according  to  the  separator  theorem, 

2)  draw  the  subgraphs  by  recursively  calling  the  procedure,  and 

3)  draw  the  edges  which  link  the  subgraphs  together  without  introducing  too 
many  crossings  and  so  that  every  node  remains  "close"  to  the  exterior  of  the 
drawing. 

In  order  to  illustrate  the  procedure,  we  will  describe  in  detail  how  drawings  Dt 
and  D2  of  two  w-node  subgraphs  are  used  to  construct  a  drawing  D  of  the 
combined  2/;/- node  subgraph.  Let  dm)  denote  number  of  crossings  in  Dt  or  D2  , 
whichever  is  larger.  Further  let  dim)  denote  the  maximum  number  of  edges  which 
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must  be  crossed  in  order  to  draw  an  edge  from  any  node  in  Dt  or  D2  to  the 
exterior  of  l)t  and  D2  .  Construct  D  from  the  drawings  of  Dt  and  D2  by  drawing 
in  the  O (m,/2)  edges  which  link  them  together  in  the  best  way  possible.  Now  let 
t{2m)  and  d[2m)  be  the  obvious  values  for  the  constructed  drawing  D.  It  is  not 
difficult  to  show  that 

c(2m)  <  2c(ni)  +  0 (m)  +  0(ml/2d(m)) 

and  that 

d(2m)  <  d{m)  +  O (m,/2) . 

Solving  the  recurrences  in  general,  we  find  that  d(m)  <  0 (nt,/2)  and  thus  that 
c(m)  <  Q(mlogm) .  Huts  the  above  procedure  can  be  used  to  find  a  drawing  for  G 
with  at  most  O(NlogN)  crossings  □ 

Using  the  preceding  result,  we  can  substantially  generalize  Theorem  6-1. 

Theorem  7*9:  Any  N-node  graph  with  an  0 ( N,/2)- separator  can  be  embedded  in 
an  0(i\’logN)-node  planar  graph. 

Proof:  Construct  a  drawing  of  the  graph  with  0 {NlogN)  crossings  according  to 
the  method  described  in  the  proof  of  Theorem  7-8.  Replace  each  edge  crossing  in 
the  drawing  with  an  artificial  node.  The  resulting  graph  has  O(NlogN)  nodes,  is 
planar  and  embeds  the  original  graph  □ 


CHAPTER  8 


WIRE  AREA  ARGUMENTS 


In  this  chapter,  we  extend  the  method  of  section  7.2  to  prove  lower  bounds  on 
the  wire  area  of  a  variety  of  networks.  In  each  proof,  we  will  use  a  layout  of  a 
network  to  produce  a  layout  for  the  complete  graph.  By  showing  that  the  nodes  of 
the  layout  are  widely  spread  out,  we  will  be  able  to  conclude  that  the  wire  area  of 
the  layout  for  the  complete  graph  is  very  large.  Provided  that  the  edges  of  the 
original  network  were  not  traced  over  too  many  times,  we  can  then  reason  that  the 
wire  area  of  the  original  network  is  also  large. 

8.1  Lower  Rounds  for  the  2*Dimensional  Mesh  of  Trees 

In  this  section,  we  find  tight  lower  bounds  for  the  layout  area  and  maximum 
edge  length  of  the  2-dimensional  mesh  of  trees. 

Theorem  8*1:  The  wire  area  of  the  N-node  2-dimensional  mesh  of  trees  is  at  least 
QiNlo^N). 

Proof:  As  usual,  we  denote  the  nxn  mesh  of  trees  by  M2n  .  In  addition,  let 
h(m)  denote  the  wire  area  of  M2n  and  let  a  be  a  positive  constant  such  that 

(*)  a  <  n/(4log?n)  for  all  n>2,  and 

(**)  a  <  22h20/ (fi¥)  for  all  />/ 

where  fi-fL  j  also  a  constant  Clearly  such  a  constant  exists  (a  =  2'30  should 
suffice)  and  clearly  w(n)  >  an2 log2 n  for  n-1  and  2.  Consider  a  value  of  n>4 
which  is  a  power  of  2  and  assume  that  for  all  values  of  nKn  which  are  powers  2 
that  n(m)  >  am2log2m  .  We  will  use  induction  to  show  that  w(/i)  >  an2lo^n  . 
Since  M2n  has  N=Q(n2)  nodes,  this  will  be  sufficient  to  prove  the  theorem. 

Consider  any  layout  for  M2  n  which  uses  m(/j)  wire.  Partition  the  layout  into 
three  vertical  strips  V0,  Vt  and  V 2  so  that  the  center  strip  contains  3n2/4  leaves 
and  each  outer  strip  contains  n2/S  leaves.  Similarly  partition  the  layout  into  three 


:o 


si 


Figure  8- 1 :  Partitioning  of  the  layout  for  M 2i n. 


Let  p  denote  the  length  of  the  longest  side  of  the  center  block  formed  by  the 
intersection  of  Vj  and  H} .  Without  loss  of  generality,  we  assume  that  the  longest 
side  is  horizontal.  In  what  follows,  we  will  show  that  p  >  ( al/2nlogn)/8  . 

Since  each  of  the  regions  V0r\Ht  and  F^n/Zy  can  contain  at  most  n2/8 
leaves,  it  is  clear  that  V f\Hl  contains  at  least  n2/2  leaves.  Consider  the  nv2 
subgraphs  of  M2  n  produced  by  eliminating  the  top  ( 3logn)/4  levels  of  the  row 
and  column  binary  trees  of  M2n  .  Each  of  these  subgraphs  is  isomorphic  to 
ni/4 .  By  the  pigeonhole  principle,  at  least  1/2  of  these  subgraphs  have  at  least 
one  leaf  in  I^n/Zy  .If  p  <  ( al/2nlogn)/8  (otherwise  we  are  done),  then  at  most 
4p  <  ( a,/2nlogn)/2  edges  can  cross  the  boundary  of  FyD/Zy  .  Thus  at  most 
(aI/2nlogn)/2  of  the  subproblems  which  have  at  least  one  leaf  in  KyflZZy  can 
have  some  node  or  part  of  an  edge  outside  KynZZy  .  This  means  that  at  least 
(n3/2  -  a,/2nlogn)/2  copies  of  AZ^^// v  are  wholly  contained  in  LyO/Zy  . 
Applying  the  inductive  hypothesis,  we  conclude  that  I 'f\Hl  contains  at  least 

(n3/2  *  a  l/2nlogn)  h( n ,/4)  /  2  >  (a n2log2n  -  a 3/2 n3/2 log3 n)  /  32 

>  (an2log2n)/04  wire. 
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(The  last  inequality  follows  trivially  from  (*).)  Thus  VfiHj  has  at  least 
(an2log2n)/64  area  and  p  >  ( al/2nlogn)/8 ,  as  claimed. 

We  next  use  the  layout  for  A  i2n  to  construct  a  drawing  for  the  complete  graph 
on  n2  nodes  (namely,  the  n2  leaves  of  M2n  )•  No  matter  how  the  edges  of  the 
complete  graph  are  drawn  in  the  plane  (e.g.,  they  may  cross  or  overlap),  it  is  clear 
from  Figure  8-1  that  the  sum  of  the  lengths  of  all  the  edges  (as  measured  in 
Euclidean  space)  is  at  least  n4p/64  >  ( al/2n5logn)/2 9  .  This  is  due  to  the  fact 
that  n4/64  edges  pass  from  region  VQ  to  region  V2  and  that  these  regions  are 
separated  by  a  distance  p. 

Let  Lj  denote  the  sum  of  the  lengths  of  the  edges  in  the  ith  levels  of  the  binary 
trees  of  M2  n  .  Since  every  level  /  edge  is  traced  over  at  most  n3Z'  times  in  the 
drawing  of  the  complete  graph,  we  can  conclude  that 

>  ( aI/2n5logn)/2 9 

C*  I 

and  thus  that 

>  (a,/2n2logn)/29 . 

In  particular,  this  means  that 

Li  >  (al/2n2logn2iy{29Pi2) 

oo 

for  some  /  <  logn  .  (Recall  that  B  =  2  j  ‘2  •)  Otherwise, 

y, 

<  (a  I/2n2logn2'y(29fii2) 
for  1  <  /'  <  logn  and  thus 

<  ^](a  I/2n2logn)/(29p  i2) 

i *  i  i»* 

<  (a  ,/2n2logn)/29,  a  contradiction. 

Using  the  straightforward  relation 

w (n)  >  22i\\{nlt)  +  Lj 
where  /  has  been  chosen  so  that 

Lj  >  (al/2n2logn2iy(2r)fii2) , 


we  can  conclude  that 


\\(ri)  >  22la(nTt)2(logn-  i)~  +  (al/2n2logn2iy{29fii2) 

>  an2 log2 n  -  2ain2logn  +  ( a 1/2 n 2Iogn2‘)/(29p i2) 

>  art2 log2 n  . 

(The  last  inequality  follows  trivially  from  (**).)  Thus  »v<//)  >  Q,{n2lo^n)  for  all  n  □ 

Theorem  8*2:  Any  layout  of  the  N-node  2-dimensional  mesh  of  trees  contains  a 
wire  of  length  at  least  Sl(N,/2 logN/loglogN). 

Proof:  It  is  sufficient  to  show  that  any  layout  for  M2n  contains  a  wire  of  length 
at  least  U(nlogn/loglogn).  Assume  for  the  purposes  of  contradiction  that  this  is  not 
the  case  and  consider  a  layout  of  M2n  for  which  the  longest  wire  has  length 
q « 0(nlogn//oglogn) .  Using  arguments  similar  to  those  used  to  prove 
Theorem  5-2,  we  first  show  that  (without  loss  of  generality)  the  area  of  such  a 
layout  is  at  most  O(q2log2n)  «  0(n2log4n)  . 

Since  every  pair  of  nodes  of  M 2  n  is  linked  by  a  path  of  length  at  most  4logn,  all 
of  the  nodes  in  the  layout  are  contained  in  a  4qlogn  x  4qlogn  square.  At  most 
ICqlogn  wires  may  leave  and  re-enter  the  square  at  various  points  along  its 
boundary.  Without  increasing  the  lengths  of  any  of  these  wires,  it  is  possible  to 
rewire  the  segments  outside  the  square  using  at  most  0 (q2log2n)  additional  area. 
Thus,  the  resulting  layout  for  M2n  will  have  maximum  edge  length  q  and  area  at 
most  O (q2log2n). 

The  proof  is  completed  by  observing  that  any  layout  of  M2  n  with  area  less  than 
0 (n2log4n)  must  have  a  wire  of  length  at  least  U(nlogn/loglogn).  From  the  proof 

of  Theorem  8-1,  we  know  that  2L.2*'  >  ( al/2n2logn)/2 9  .  Thus  either 

**> 

1)  there  is  an  i  <  4loglogn  such  that  L,  >  ( al/2n2logn2iy(2l2loglogn )  ,  or 

2)  there  is  an  i  >  4logIogn  such  that  Lt  >  (a  l/2n2logn2ty(2,0P i2) 

*c 

where,  as  before,  the  constant  /J  =  2  j  '2-  Otherwise, 

j** 

i«j*»  ,<55 

2V'  =  2^  + 

i-i  i""  'IV*1 

<  ( al/2n2logn)/2 10  -/•  (( a,/2n2logn)/2,0p\  z,r2 

<  ( al/2n2lognV2()  ,  a  contradiction. 


The  second  condition  cannot  possibly  be  true,  however.  If  it  were,  the  area  of 
the  layout  would  be  at  least 

Lj  £  Sl(n2logn/P) 

>  Q(n2lo£nA}oglogn)*) 

>  U(n2logfn)  ,  a  contradiction. 

Thus  the  first  condition  must  be  true  and  there  is  an  /  such  that  Li  ^ 
12 (n2logn2'/loglogri) .  Since  there  are  n2i+ 1  type  /  edges  in  M2n ,  we  can  conclude 
that  at  least  one  of  them  has  length  at  least  Q(nlogn/loglogn)  □ 

8.2  Lower  Bounds  for  the  Tree  of  Meshes 

Using  the  results  of  the  previous  section,  it  is  easy  to  demonstrate  the  existence 
of  planar  graphs  w  hich  cannot  be  laid  out  in  linear  area  and  which  must  have  long 
wires.  In  particular,  we  can  conclude  the  following. 

Theorem  8*3:  The  wire  area  of  the  N-node  tree  of  meshes  is  at  least  Q(NlogN). 

Proof:  As  we  showed  in  section  6.3.3b,  the  A-node  2-dimensional  mesh  of  trees 
can  be  embedded  in  an  O(A7ogA0-node  tree  of  meshes.  By  Theorem  8-1,  we  can 
thus  conclude  that  the  wire  area  of  the  NlogN- node  tree  of  meshes  is  at  least 
tt(Nlog?N).  Equivalently,  the  wire  area  of  the  A'-node  tree  of  meshes  is  at  least 
12  (NlogN).  □ 

Theorem  8-4:  Any  layout  of  the  N-node  augmented  tree  of  meshes  has  a  wire  of 
length  at  least  tl(N1/2/logI/2N). 

Proof:  In  the  proof  of  Theorem  8-1,  we  showed  that  any  layout  of  M2n  has  two 
leaves  which  are  spaced  at  least  Q{nlogn)  distance  apart  Since  (as  we  showed  in 
section  6.3.3b)  M2  n  can  be  embedded  in  T2n  so  that  the  leaves  of  M2  n  are 
embedded  in  the  leaves  of  T2n  ,  we  can  observe  that  any  layout  of  T2n  also  has 
two  leaves  which  are  spaced  at  least  S2(n/ogn)  distance  apart  Since  every  pair  of 
leaves  in  T2n  are  linked  by  a  path  of  length  at  most  O(logn)  in  T2n\  we  can 
conclude  that  some  edge  of  hn  has  length  at  least  Q(n)  =  Q(NI/2/log,/2N)  □ 

Had  we  so  desired,  we  could  have  proved  both  results  directly,  using  arguments 
identical  to  the  ones  used  to  prove  Theorem  8-1. 


8.3  Lower  Bounds  for  a  Restricted  Class  of  Binary  Tree  Layouts 

In  [BK80],  Brent  and  Kung  considered  layouts  of  N-node  complete  binary  trees 
for  which  every  leaf  is  located  on  the  boundary  of  some  convex  region.  In 
particular,  they  showed  that  the  wire  area  of  any  such  layout  is  at  least  Sl(NlogN). 
Recently,  Patterson,  Ruzzo  and  Snyder  [PRS81]  extended  this  result  by  showing 
that  any  such  layout  with  area  A  must  have  some  edge  of  length  Q (N/log(A/N))  . 
In  particular,  this  means  that  if  A  =  0{NlogN) ,  then  there  must  be  some  edge  of 
length  Q(N/loglogN)  but  that  if  A  =  Q(N1+e)  for  some  e>0,  then  there  must 
only  be  an  edge  of  length  Q(N/logN).  In  what  follows,  we  show  how  to  use  the 
techniques  developed  in  this  chapter  to  give  short  proofs  of  these  facts. 

Theorem  8-5  (Brent  and  Kung  [BK80]):  Any  layout  of  the  N-node  complete 
binary  tree  in  which  every  leaf  is  on  the  boundary  of  some  convex  region  requires 
Q(NlogN)  area. 

Proof:  Given  any  such  layout,  we  first  use  the  methods  of  section  8.1  to 
construct  a  layout  of  the  complete  graph  on  the  n=  Q(N)  leaves  of  the  tree.  Since 
the  leaves  are  on  the  boundary  of  some  convex  region,  it  is  easily  shown  that  the 
layout  of  Kn  uses  at  least  Qf/r3)  wire. 

Let  Lt  denote  the  sum  of  the  lengths  of  the  edges  in  the  ith  level  of  the  tree.  As 
each  ith  level  edge  is  traced  over  at  most  n2?'  times,  we  know  that 

°^Vz'L,  <  I2(/i-J) 

loyi 

and  thus  that  XlJ’1  >  Q(n) .  Using  arguments  similar  to  those  in  the  proof  of 

lei  - 

Theorem  8-1,  we  can  conclude  that  L(  >  U(n2l/t 0  for  at  least  one  value  of  i. 
Letting  w (n)  denote  the  wire  area  of  the  binary  tree  layout,  we  can  see  that 

h(n)  >  2in(n2’i)  +  S2(n 2i/i2) . 

Solving  the  recurrence,  we  find  that  w{n)  >  to(nlogn)  =  Q(NlogN)  □ 

Theorem  8*6  (Patterson,  Ruzzo  and  Snyder  [PRS81]):  Any  A-area  layout  of  the 
N-node  complete  binary  tree  in  which  every  leaf  is  on  the  boundary  of  a  convex 
region  has  some  edge  of  length  Q(N/log{A/N)). 

Proof:  The  proof  follows  that  of  the  preceding  theorem  until  it  is  concluded  that 

2 L j?'  >  ft(«)  •  Using  methods  similar  to  those  used  to  prove  Theorem  8-2,  we 
**• 
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can  then  observe  that  one  of  the  following  conditions  must  be  satisfied: 

1)  there  is  an  /  ^  2lo£A/ri)  such  that  I,  >  Q(n2!/log(A/n))  ,  or 

2)  there  is  an  /  >  2log(A/n)  such  that  L{  >  Q(n2'/r)  . 

The  second  condition  cannot  possibly  hold  since,  if  it  did,  the  layout  area  would 
be  at  least  Li  >  Q(n2'/i^)  which,  for  /  ^  2lo£A/ri)  ,  means  that 

A  >  Q(A2/nlog2(A/n )) 

>  fi(/0  ,  a  contradiction. 

Thus  ihe  first  condition  holds  and  we  can  conclude  that  there  is  an  /  such  that 
Lf  >  Q(n2*/log(A/n)) .  As  there  are  only  2i+ 1  edges  in  the  iih  level,  at  least  one  of 
ihem  must  have  length  at  least  Sl(n/log(A/n))  =  Q,(N/lo£A/N))  □ 


CONCLUSION  INDEX  REFERENCES 


and  ADDENDUM 


CONCLUSION 


In  Part  I  of  the  thesis,  we  described  several  new  layouts  for  the  shuffle-exchange 
graph.  In  particular,  we  found 

1)  an  asymptotically  optimal  O(7VV/og2A0-area  layout  of  the  Af-node  shuffle- 
exchange  graph,  and 

2)  practical  layouts  for  small  shuffle-exchange  graphs. 

As  a  result,  it  should  now  be  possible  to  construct  large  scale  shuffle-exchange 
chips.  The  only  remaining  question  is  whether  or  not  there  is  a  layout  of  die  N- 
node  shuffle-exchange  graph  for  which  every  wire  has  length  at  most  0{N/log^N). 
All  known  layouts  have  wires  of  length  at  least  Q(N/logN). 

In  Part  II  of  the  thesis,  we  descibed  techniques  for  finding  good  lower  bounds 
on  the  crossing  number,  wire  area,  maximum  edge  crossing  and  maximum  edge 
length  of  a  variety  of  VLSI  networks.  In  particular,  we  applied  these  techniques  to 
find 

1)  an  A- node  planar  graph  which  has  layout  area  Q(NlogN)  and  maximum 
edge  length  Q(N^2/logI/2N), 

2)  an  A- node  graph  with  an  0(A//?)-separator  which  has*,  layout  area 
Q(Nlog2N)  and  maximum  edge  length  0(Nl/2logN/loglogN),  and 

3)  an  A-node  graph  with  an  0(A“)-separator  (for  a>l/2)  which  has  maximum 
edge  length  0(Aa). 

Thus  we  have  answered  all  the  open  questions  concerning  bounds  for  layout 
area  and  maximum  edge  length  of  networks  with  known  separators.  We  have  only 
partially  answered  the  corresponding  questions  for  planar  graphs,  however.  In 
particular,  it  would  be  of  great  interest  to  know  whether  or  not  every  A-node 
planar  graph  can  be  laid  out  in  O(NlogN)  area. 


area  of  a  layout  4 
artificial  node  74 
augmented  tree  of  meshes  72 

basic  piece  of  a  necklace  26 
basis  node  15 
bisection  width  52 

complex  plane  diagram  9 
crossing  of  the  first  kind  75 
crossing  of  the  second  kind  76 
crossing  number  5 

degenerate  necklace  10 

diameter  56 

distance  in  a  graph  56 

distinguished  node  of  a  basic  piece  26 

distinguished  node  of  a  necklace  21 

distinguished  node  of  a  primary  piece  26 

distinguished  node  of  a  secondary  piece  26 

even  node  22 
exchange  edge  3 

full  necklace  10 

layout  area  4 
left  edge  65 
level  11 
leveling  18 
level-nccklace  grid  12 

maximum  edge  crossing  5 
maximum  edge  length  4 
mesh  of  trees  59,  63 
minimum  number  represented  18 

necklace  10 

odd  node  22 

primary  block  of  zeros  21 

primary  node  22 

primary  piece  of  a  necklace  26 


radius  of  a  necklace  1 8 
reverse  edge  31 
right  edge  65 

secondary  block  of  zeros  21 
secondary  node  22 
secondary  piece  of  a  necklace  26 
separator  51 
shift  edge  30 
shuffle  edge  3 
shuffle-exchange  graph  3 
shuffle-shift  graph  31 
shuffle-shift-reverse  graph  31 
shuffle-tree  graph  65 
simple  graph  83 
simultaneous  separator  67 
size  of  a  necklace  15 
size  of  a  node  8 

Thompson  model  2 
track  2 

transpose  edge  32 
tree  of  meshes  65 
type  z  edge  80 
type  i-j  crossing  80 

value  of  a  node  9 

wire  area  5 
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ADDENDUM 


Much  has  been  accomplished  during  the  period  of  time  between  the  submission 
of  this  thesis  to  the  MIT  math  department  in  August  of  1981  and  the  publication  of 
the  thesis  as  a  technical  report  in  June  of  1982.  In  fact,  so  much  has  been 
discovered  in  the  interim  that  it  would  be  possible  to  write  several  additional  thesis 
on  the  subject  As  an  aide  to  those  who  wish  to  know  more  about  the  new 
material,  we  have  included  below,  a  brief  bibliography  of  some  of  the  recent  work 
on  layout  strategies  for  VLSI. 
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